• Coding
  • Exercise - Brainfuck Interpreter.

Yes, you read it right, we're going to code a brainfuck interpreter. It's not as difficult as it sounds, although probably will be one of the most interesting exercises we've had so far. But first a small announcement:


The rise and (quick) fall of the Lebgeeks Programming Competition:
Most of you here in the Programming section have witnessed the shy attempt at setting up the Lebgeeks Programming Competition. And you also saw how it failed :'( There are plenty of reasons why it did, and it'd be useless to discuss them. However, no one will deny that the "Exercises" series had a certain success. So I say, let's stick to that for now. It would be nice this time though, if we could avoid the duplicity of exercises and all focus on one at a time.


Coding a brainfuck interpreter:

Language: C or C++
Topics: Files, pointers.
Difficulty: Medium


What is Brainfuck?
Brainfuck is a cool (and very useless) programming language. For those of you who aren't familiar with it, I invite you to check the Wikipedia page for more info. But here's the gist of it:

Brainfuck is a language made up of 8 characters:

> increment the data pointer (to point to the next cell to the right).
< decrement the data pointer (to point to the next cell to the left).
+ increment (increase by one) the byte at the data pointer.
- decrement (decrease by one) the byte at the data pointer.
. output a character, the ASCII value of which being the byte at the data pointer.
, accept one byte of input, storing its value in the byte at the data pointer.
[ if the byte at the data pointer is zero, then instead of moving the instruction pointer forward to the next command, jump it
forward to the command after the matching ] command.
] if the byte at the data pointer is nonzero, then instead of moving the instruction pointer forward to the next command, jump it
back to the command after the matching [ command.

By brainfuck standards, the data pointer can only get access to 30 000 bytes (think of 30 000 bytes arrays of memory). The pointer initially starts at the beginning of the array, and according to the set of instructions seen above, the user can manipulate it freely. If the pointer gets out of bounds, it goes back to the other side of the array. For instance if it is at position 0 and we decrement it (with "<"), it will go to byte 30 000.


The exercise
You are asked to write a brainfuck interpreter. Your program should take a brainfuck source code as an input and interpret it.
To check if your program is working, you could test it on the Hello World program:
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
If it seems weird or unclear, don't hesitate to ask any question.

PS: This exercise is particularly aimed at Padre who was complaining that my last exercises were too ... "beauty contest". Padre, I'm expecting to see your code here :)
I'm not sure I quiet understood the intention of the exercise. Brainfuck is supposed to output something? Like words maybe? Correct me if I'm wrong, but is the code block you posted a representation of "Hello World" in Brainfuck language?
Brainfuck is a language just like C, PHP, Java, Haskell, Ada, OCaml, Python, Perl and ksh. In other words, it is "a way of telling the computer what we want him to do". The program you're aksed to write should translate brainfuck programs into native binaries that your computer will understand (and execute).

Unlike compiled languages, your program should translate the instructions written in Brainfuck only at execution time. This is the exact same behavior as PHP, Python, Perl and, to some extent, Java and C#.

Fortunately, the brainfuck instructions are very self descriptive, and easily implemented in C or C++ (move a pointer, increment a value, ...).
Brainfuck is supposed to output something? Like words maybe?
Is PHP supposed to output something like words maybe? It's the exact same thing. Brainfuck is a programming language. You can do anything with it. As a matter of fact, Brainfuck is often called a Turing tarpit, meaning it's an incredibly powerful language that allows you to do virtually anything, but the implementation of the simplest easiest things is a nightmare.
Correct me if I'm wrong, but is the code block you posted a representation of "Hello World" in Brainfuck language?
You are correct.
Fucking amazing.
#include <iostream>
using namespace std;

typedef unsigned char byte;
typedef unsigned int uint;

const int MEM_SIZE = 30000;
byte bytes[MEM_SIZE] = {0};

int main(){
	int d = 0;
	uint p = 0;

	char* bf =
		"++++++++++[>+++++++>++++++++++>+++>+<<<<-]"
		">++.>+.+++++++..+++.>++.<<+++++++++++++++."
		">.+++.------.--------.>+.>.";

	while(bf[p] != 0){
		switch(bf[p]){
		case '+':
			bytes[d]++;
			break;
		case '-':
			bytes[d]--;
			break;
		case '<':
			d = (d-1+30000) % 30000;
			break;
		case '>':
			d = (d+1+30000) % 30000;
			break;
		case '.':
			cout << bytes[d];
			break;
		case ',':
			cin >> bytes[d];
			break;
		case '[':
			{
				int count = 1;
				if(bytes[d] == 0){
					p++;
					while(count != 0){
						if(bf[p]=='[')
							count++;

						if(bf[p]==']')
							count--;

						p++;
					}
				}
			}
			break;
		case ']':
			{
				int count = 1;
				if(bytes[d] != 0){
					p--;
					while(count != 0){
						if(bf[p]==']')
							count++;

						if(bf[p]=='[')
							count--;

						p--;
					}
				}
			}
			break;
		}
		p++;
	}

	return 0;
}
My program has been tested and found to be faulty in certain cases. I hope this different implementation will highlight the problem if it really exists.

Another implementation of the stacking follows:
#include <iostream>
using namespace std;

typedef unsigned char byte;
typedef unsigned int uint;

const int MEM_SIZE = 30000;
byte bytes[MEM_SIZE] = {0};

void parse(char *bf, uint &p, int dir, char open, char close){
	for(;;) {
		p += dir;
		if(bf[p] == open)
			parse(bf, p, dir, open, close);
		else if(bf[p] == close)
			return;
	}
}


int main(){
	int d = 0;
	uint p = 0;

    char* bf =
        "++++++++++[>+++++++>++++++++++>+++>+<<<<-]"
        ">++.>+.+++++++..+++.>++.<<+++++++++++++++."
        ">.+++.------.--------.>+.>.";


	while(bf[p] != 0){
		switch(bf[p]){
		case '+':
			bytes[d]++;
			break;
		case '-':
			bytes[d]--;
			break;
		case '<':
			d = (d-1+30000) % 30000;
			break;
		case '>':
			d = (d+1+30000) % 30000;
			break;
		case '.':
			cout << bytes[d];
			break;
		case ',':
			cin >> bytes[d];
			break;
		case '[':
			{
				int count = 1;
				if(bytes[d] == 0)
					parse(bf, p, +1, '[', ']');
			}
			break;
		case ']':
			{
				int count = 1;
				if(bytes[d] != 0)
					parse(bf, p, -1, ']', '[');
			}
			break;
		}
		p++;
	}

	return 0;
}
Goodness you code fast, arithma. The all-purpose paren-matching parse function is especially clever. I'd like to take a crack at making an interactive interpreter in C if I can. I'll post the code tomorrow when I'm done.
Am I hogging all the fun? We can set a non-spoiler time before which we shouldn't post (though I wouldn't get the "Goodness you code fast" thingy, which I appreciated :).
I pledge I will read the code for anyone who posts it in these exercises even if I post beforehand (within what time allows).
I always read all the codes that are posted on these exercises. I encourage all of you to do the same. But I disagree with non-spoiler times. It would kill certain threads super fast.
Yeah, I think it's OK to post as soon as you're done. No need for non-spoiler times :)
There's a special place in hell reserved for me for creating this code, but I'm posting it anyway!

This is a brainfuck interpreter. It waits for you to enter your brainfuck instructions, and when you press enter, it will execute them and print the output (if any), and then wait for your next instruction. I've written out each brainfuck symbol as its own function even though that's not really necessary, but I just wanted to leave room for some future meddling with the code. Everything else happens in the eval() function, which is almost identical to arithma's original code.

There's a lot of code repetition here the can be eliminated but I'm kind of lazy!

... Turing must be spinning in his grave.
#include <stdio.h>
#include <string.h>

#define INPUT_SIZE 1024
#define BF_BUFFER_SIZE 30000

unsigned char bf_buffer[BF_BUFFER_SIZE];
int bf_pointer;

void right() {
  bf_pointer = (bf_pointer + 1) % BF_BUFFER_SIZE;
}

void left() {
  bf_pointer = (bf_pointer - 1) % BF_BUFFER_SIZE;
}

void inc() {
  bf_buffer[bf_pointer]++;
}

void dec() {
  bf_buffer[bf_pointer]--;
}

void output() {
  printf("%c", bf_buffer[bf_pointer]);
}

void input() {
  bf_buffer[bf_pointer] = getchar();
}

int jz(char * buffer, int pointer) {
  if (bf_buffer[bf_pointer] == 0)
    while(++pointer < INPUT_SIZE)
      if (buffer[pointer] == ']')
	return pointer;
      return pointer;
}

int jnz(char * buffer, int pointer) {
  if (bf_buffer[bf_pointer] != 0)
    while(--pointer >= 0)
      if (buffer[pointer] == '[')
	return pointer;
      return pointer;
}

void eval(char * buffer)
{
  int input_pointer = 0;
  while (input_pointer < INPUT_SIZE) {
    switch (buffer[input_pointer]) {
      case '>': right();
      break;
      
      case '<': left();
      break;
      
      case '+': inc();
      break;
      
      case '-': dec();
      break;
      
      case '.': output();
      break;
      
      case ',': input();
      break;
      
      case '[': input_pointer = jz(buffer, input_pointer);
      break;
      
      case ']': input_pointer = jnz(buffer, input_pointer);
      break;
      
      case '\n':
	return;
	break;	
    }
    
    input_pointer++;
  }
}

int main(int argc, char * argv[])
{
  char buffer[INPUT_SIZE];
  memset(bf_buffer, 0, BF_BUFFER_SIZE);
  bf_pointer = 0;
  while (1) {
    printf("brainfuck:: ");
    fflush(stdout);
    if (fgets(buffer, INPUT_SIZE, stdin) == NULL)
      return 0;
    eval(buffer);
  }
  
  return 0;
}
saeidw wroteGoodness you code fast, arithma.
I second that. The specs of the language alone were enough to discourage me. Kudos to arithma.
@saeidw: Are you sure of your jz and jnz implementation? It does not seem to account for consecutive open brackets.
The way I understood it, the brainfuck interpreter must match the open brackets with closed ones. You seem to be just finding the first bracket of the other kind.
@arithma: you're right! I actually realized that when I was reading your code (the parse() function) but I completely ignored it when I was writing this, so I fell into that trap. Here are the fixed jz() and jnz() functions now with extra inner looping yumminess!
int jz(char * buffer, int pointer) {
    int inner = 1;
    int temp = pointer;
    while(++pointer < INPUT_SIZE) {
      if (buffer[pointer] == '[') inner++;
      if (buffer[pointer] == ']') {
	if (--inner == 0) break;
      }
    }
    if (inner != 0) {
      printf("Error: unmatched '['\n"); 
      pointer = INPUT_SIZE-1;
    }
    if (bf_buffer[bf_pointer] == 0) return pointer;
    return temp;
  }

  int jnz(char * buffer, int pointer) {
    int inner = 1;
    int temp = pointer;
    while(--pointer >= 0) {
      if (buffer[pointer] == ']') inner++;
      if (buffer[pointer] == '[') {
	if(--inner == 0) break;
      }
    }
    if (inner != 0) {
      printf("Error: unmatched ']'\n");
      pointer = INPUT_SIZE-1;
    }
    if (bf_buffer[bf_pointer] != 0) return pointer;
    return temp;
  }
PS: This exercise is particularly aimed at Padre who was complaining that my last exercises were too ... "beauty contest". Padre, I'm expecting to see your code here :)
ok ok , got it :P
just saw that :/, will code smth during the day.
int seek(char* buf, int p,int dir, char match, char umatch)
{
	int skip=0;
	while(1)
	{
		char cc=buf[p];
		if (buf[p]==umatch) skip++;
		if (buf[p]==match)
		{
			if (!skip) return p;
			if (skip) skip--;
		}
		p=p+dir;
	}
}

void bf(char *code)
{
	unsigned int p=0;
	unsigned int E=0;
	int bf_mem[BF_SIZE] = {0};

	while (code[E]!='\0')
	{
		switch (code[E])
		{
			case '+':
				bf_mem[p]++;
				break;
			case '-':
				bf_mem[p]--;
				break;
			case '>':
				p = p + 1 % BF_SIZE;
				break;
			case '<':
				p = p-1 % BF_SIZE;
				break;
			case '.':
				putchar(bf_mem[p]);
				break;
			case ',':
				bf_mem[p]=getchar();
				break;
			case '[':
				if(!bf_mem[p]) E=seek(code,E+1,1,']','[');
				break;
			case ']':
				if(bf_mem[p]) E=seek(code,E-1,-1,'[',']');
				break;
		  }
		E++;
	}
 }
this is my quick code. looks quite like arthima's code now that i compare them ... except that it doesn't have that sexy recursion :P
            case '>':
                p = p + 1 % BF_SIZE;
                break;
            case '<':
                p = p-1 % BF_SIZE;
                break;
% has higher precedence than + or -
This thing is bound to blow up.

I tried this code (which rahmu sent me). It worked well.
>++++++++++>+>+[
    [+++++[>++++++++<-]>.<++++++[>--------<-]+<<<]>.>>[
        [-]<[>+<-]>>[<<+>+>-]<[>+<-[>+<-[>+<-[>+<-[>+<-[>+<-
            [>+<-[>+<-[>+<-[>[-]>+>+<<<-[>+<-]]]]]]]]]]]+>>>
    ]<<<
]
It's brainfucking amazing that it even avoids overflows!
5 years later
More fun in Go:
package main

import (
	"errors"
	"fmt"
	"log"
)

func matching_paren_up(code string, pos int) (int, error) {
	nr := 0
	for ; pos < len(code); pos += 1 {
		switch code[pos] {
		case '[':
			nr++
		case ']':
			nr--
			if nr < 0 {
				return pos, nil
			}
		}
	}
	return 0, errors.New("Couldn't match [")
}

func matching_paren_down(code string, pos int) (int, error) {
	nr := 0
	for pos -= 1; pos > 0; pos -= 1 {
		switch code[pos] {
		case ']':
			nr++
		case '[':
			nr--
			if nr < 0 {
				return pos, nil
			}
		}
	}
	return 0, errors.New("Couldn't match ]")
}

func bf(bf_code string) {
	data := make([]int, 30)
	ptr, idx := 0, 0
	var err error

	for ; idx < len(bf_code); idx += 1 {
		switch bf_code[idx] {
		case '+':
			data[ptr] += 1
		case '-':
			data[ptr] -= 1
		case '>':
			ptr = (ptr + 1) % 30000
		case '<':
			ptr = (ptr - 1) % 30000
		case '.':
			fmt.Printf("%c", data[ptr])
		case ',':
			fmt.Scanf("%c", data[ptr])
		case '[':
			if data[ptr] == 0 {
				idx, err = matching_paren_up(bf_code, idx)
			}
		case ']':
			if data[ptr] != 0 {
				idx, err = matching_paren_down(bf_code, idx)
			}
		}
		if err != nil {
			log.Fatal(err)
		}
	}
}

func main() {
	bf_helloworld := "++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>."
	bf(bf_helloworld)
}