• Coding
  • [Exercise] Parsing S-expressions

Today's exercise consists of a gentle introduction to the problems of parsing input text.
You are asked to write a calculator that deals with S-expressions.

S-expressions
If you know Lisp, you already know what S-expressions are and you are probably ready to work on the exercise.
For those of you who don't, here's a (very) simple introduction.

S-expressions are nothing more than nested lists represented like this:
(elt1 elt2 elt3 ...)
where each element can be any of:

- an atom.
- another S-expression.

Atoms are the primitives of the language. In the case of our calculator we will consider
- any integer
- fundamental arithmetic operations (+, -, *, /)

One more thing you need to know is that the first element of the expression will be considered the "operation" and the rest are the "operands" (or the arguments of the function depending how you look at it).

Let's take a look at some examples to make things clearer. Here are some valid S-expr (and their computed values as comments):
(+ 5 5) # equals 10
(+ (- 3 2) (* 9 2)) # equals 19
(/ 24 6 2) # equals 2
(/ 24 (/ 6 2)) # equals 8
As mentionned, the goal of the exercise is to write a program (in the language of your choice) that reads S-expressions and outputs their computed values.

A few notes:

- Operations can have any number of arguments.
- The order of evaluation matters for non-associative operations. Notice the difference between my 3rd and my 4th example.
- There is no limit to the depth of recursion (except the actual limits on your computer).
- We won't deal with error handling (invalid S-expr, division by zero, ...). You can assume that the input is always valid (let's not clutter the code uselessly).
- Whitespaces can be any number of space characters, tabs and newlines. The following is a perfectly valid (although terribly presented) S-expression:
(+ ( *   4 5)
   (   -   12       2) 
   (/ 45   (- 9 
                4))
   (* (+ 100 2)
       (/ 36 3)))
The presentation I did of S-expr is extremely "light". For a more rigorous definition, take a look at the Wikipedia article linked above. However, for the curious, here's a bit of formalism that's easy to grasp:
The above syntax is sometimes called "prefix notation" because the operator comes first. Similarly you can find "postfix notations" and "infix notations".

Here's the same expression expressed pre, in and postfix:

- prefix: + 1 2
- infix: 1 + 2
- postfix: 1 2 +

You may be familiar with infix notations exclusively, but it's good to know that the other two exist, and despite what you may think, are widely used in certain fields.

Bonus points
If you want to push the exercise further, here's what you can add to it:

- Proper error handling and displaying nice messages to the user in case of problems.
- Allow floating point numbers (like "5.3") as an atomic primitive of the language.
- Add more operators like power(^), modulo(%), trigonometric operations, ...

Final note
I hope the presentation of the problem is clear. If you have any doubt or any question do not hesitate to ask!
worked for all the inputs given:
<?php
function calc($s)
{
	$s=explode(" ",substr($s,1,-1));
	$t=$s[1];
	for($i=2;$i<count($s);$i++)
		$t.=$s[0].$s[$i];
	return eval("return ".$t.";");
}
function evaluate($S)
{
	$pattern="/\([^\(]*?\)/";
	$S = preg_replace('/\s\s+/', ' ', $S);
	while(preg_match_all($pattern,$S, $res))
	{
		for($i=0;$i<count($res[0]);$i++)
			$S=str_replace($res[0][$i],calc($res[0][$i]),$S);
	}
	return $S;
}
echo evaluate("(/ 24 (/ 6 2))");
?>
Here's my solution in less than 200 lines of C.
It was a good opportunity to work with function pointers and to practice my old rusty C skills as well.

The code segfaults for the second given example (+ (- 3 2) (* 9 2)) but works well for all the rest. I'll look more closely tonight to fix the bug.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

#define NUMBER '0'
#define OPERATOR '+'
#define MAX_NUM_SIZE 100
#define MAX_DEPTH 100


typedef double (*doublefun_t) ();

double add (double a, double b) { return a+b;}
double sub (double a, double b) { return a-b;}
double mul (double a, double b) { return a*b;}
double dvs (double a, double b) { return a/b;}


typedef struct args args_t;
struct args {
    double value;
    args_t *next;
};

typedef struct sexpr sexpr_t;
struct sexpr {
    char operation;
    args_t *arguments;
};

    
sexpr_t sstack[MAX_DEPTH];
/*
  Initial value is -1 because the stack is empty.
  Will be incremented to 0 by the first opening paren.
*/
int current_level = -1;

double final_result = 0;

int getop(char s[]);
void create_sexpr();
void add_operation(char op);
void add_argument(double a);
void evaluate_sexpr();


int main(int argc, char *argv[])
{
    int type;
    char s[MAX_NUM_SIZE];

    while ((type = tokenize(s)) != EOF) {
        switch(type) {
        case '(':
            create_sexpr();
            break;
        case OPERATOR:
            add_operation(s[0]);
            break;
        case NUMBER:
            add_argument(atof(s));
            break;
        case ')':
            evaluate_sexpr();
            break;
        default: break; /* Purposfully ignoring error handling */
        }

        if (current_level < 0)
            break;
    }

    printf("%f\n", final_result);
    
    return 0;
}

/*
  Parses input from stdin.
  returns NUMBERS for numbers or ascii value for any of ( ) + - * /
*/

int tokenize(char s[])
{
    int c;
    static int buf = EOF;

    if (isalnum(buf)) {
        c = buf;
        buf = EOF;
        return c;
    }
    
    if (buf == EOF || buf == ' ' || buf == '\t') 
        while ((*s = c = getchar()) == ' ' || c == '\t')
            ;
    else 
        *s = c = buf;

    buf = EOF;
    *(s + 1) = '\0';

    if (c == 42 || c == 43 || c == 45 || c == 47)
        return OPERATOR;
    
    if (!isdigit(c) && c != '.')
        return c;       /* not a number */

    if (isdigit(c))     /* collect integer part */
        while (isdigit(*++s = c = getchar()))
            ;
    if (c == '.')       /* collect fraction part */
        while (isdigit(*++s = c = getchar()))
            ;
    *s++ = '\0';
    buf = c;

    return NUMBER;
}

/*
  Create new sexpr and put it on the sstack.
  increment current_level index
*/
void create_sexpr()
{
    sexpr_t *new = malloc(sizeof(sexpr_t));
    new->arguments = NULL;
    sstack[++current_level] = *new;
}


void add_operation(char op)
{
    sstack[current_level].operation = op;
}

void add_argument(double a)
{
    args_t *new_argument = malloc(sizeof(args_t));
    args_t *args_iterator = sstack[current_level].arguments;
    
    new_argument->value = a;
    new_argument->next = NULL;

    if (args_iterator == NULL)
        sstack[current_level].arguments = new_argument;
    
    else {
        while (args_iterator->next != NULL)
            args_iterator = args_iterator->next;

        args_iterator->next=new_argument;
    }
}

void evaluate_sexpr()
{
    char op = sstack[current_level].operation;
    doublefun_t f = NULL;

    /* variable holders used for the accumulation
     */
    double a, b;

    args_t *new_argument = NULL;
    args_t *args_iterator = sstack[current_level].arguments;

    a = args_iterator->value;

    switch(op) {
    case '+':
        f = &add;
        break;
    case '-':
        f = ⊂
            break;
    case '*':
        f = &mul;
        break;
    case '/':
        f = &dvs;
        break;
    }
        
    while (args_iterator->next) {
        b = args_iterator->next->value;
        a = (*f)(a, b);
        args_iterator = args_iterator->next;
    }

    if (--current_level >= 0) {
        new_argument = malloc(sizeof(args_t));
        new_argument->value = a;
        new_argument->next = NULL;

        if (sstack[current_level].arguments == NULL) {
            sstack[current_level].arguments = new_argument;
        } else {
            args_iterator = sstack[current_level].arguments;

            while (args_iterator->next != NULL)
                args_iterator = args_iterator->next;

            args_iterator->next= new_argument;
        }
    }
    else {
        final_result = a;
    }
}
PS: I love C.

EDIT: Updated with the bug fix. The code should work now.
a year later
Here are two with a bit more flexibility. You can pass in your own operator map.

To see a full running example, you can click here.

Each one is built in a different way.
// Define : Compute through regex match
var compute_with_regex = function(S, map){
    var result = 0;
    while(group = S.match(/\(([^()]+)\)/)){
        var original = group[0];
        var tokens = group[1].split(' ');
        var filtered = _(tokens).filter(function(e){ return e !== ''; });
        if(!map[filtered[0]]) throw Error('Operator not supported: ' + filtered[0] );
        result = map[filtered[0]].apply(
            this, _(filtered.slice(1)).map(
                function(e){ return parseFloat(e); }
            ));    
        S = S.replace(original, result);
    }
    return result;
};
// Define : Compute with stream feeding
var compute_with_stream = function(S, map){
    var is_white = function(what) { return ' \n\t\r'.indexOf(what) > -1; };
    var stream = S.split('');
    var token_stack = [];
    var index_stack = [];
    var current_token = [];
    
    while(chr = stream.shift()){
        if(chr === '(') {
            index_stack.push(token_stack.length);            
        } else if(chr === ')') {
            if(current_token.length){
                token_stack.push(current_token.join(''));
                current_token = [];
            }

            var from = index_stack.pop();
            var args = token_stack.slice(from);
            var result = map[args[0]].apply(
            this, _(args.slice(1)).map(
                function(e){ return parseFloat(e); }
            ));    
            token_stack = token_stack.slice(0, from);
            token_stack.push(result);
        } else if(is_white(chr) && current_token.length){
            token_stack.push(current_token.join(''));
            current_token = [];
        } else if(!is_white(chr)){
            current_token.push(chr);   
        }
        
    }
    return token_stack[0];
};
7 days later
Python solution with regex
sexpr.py
import re
import operator



op = {
    "+": operator.add,
    "-": operator.sub,
    "*": operator.mul,
    "/": operator.div,
    "%": operator.mod,
    "^": operator.pow
    }


patterns = re.compile("(?P<left_bracket>\()|"
                    "(?P<right_bracket>\))|"
                    "(?P<num>\-?\d+\.\d+|\-?\d+)|"
                    "(?P<operator>[\+\-\*\/\%\^])")



def compute(exp):
    stack = []
    out = []
    for match in re.finditer(patterns, exp):
        items = match.groupdict().items()
        ctype, value = next((t, v) for t, v in items if v)
        if ctype == 'left_bracket':
            stack.append(out)
            out = []
        elif ctype == 'right_bracket':
            v = reduce(op[out[0]], out[1:])
            out = stack.pop()
            out.append(v)
        elif ctype == 'num':
            v = float(value)
            out.append(v)
        elif ctype == 'operator':
            out.append(value)
        else:
            print "Invalid character"

    r = out[0]
    return int(r) if int(r) == r else r
test.py
import sexpr

tests = ["(+ 5 5)",
        "(+ (- 3 2) (* 9 2))",
        "(/ 24 6 2)",
        "(/ 24 (/ 6 2))",
        "(/ 10 3)",
        "(+ (^ 2 (^ 3 1)) 4 (^ 5 2))"]

for exp in tests:
    print exp + " = " + str(sexpr.compute(exp))
Output
(+ 5 5) = 10
(+ (- 3 2) (* 9 2)) = 19
(/ 24 6 2) = 2
(/ 24 (/ 6 2)) = 8
(/ 10 3) = 3.33333333333
(+ (^ 2 (^ 3 1)) 4 (^ 5 2)) = 37
Ayman, a few style remarks about your code:
  • You should include more 2 empty lines after the imports, and 2 empty lines before the first function definition.
  • Spaces around operators, one space after punctuation ("," and ":").
  • Instead of including your tests in a separate file, you could use comments, and especially doctest. Reviewers love doctests because they're easy to play with.
  • rb, lb and out aren't very descriptive variable names. I don't like tmpout; every variable is temporary.
  • The block containing tmpout (line 29-31) is awkward. I rewrote it underneath.
        elif ctype == 'rb':
            v = reduce(op[out[0]], out[1:])
            out = stack.pop()
            out.append(v)
Find first occurrence
I seriously dislike this:
ctype, value = [(t,v) for t,v in items if v][0]
for several reasons:
  • You're traversing all of items. It's unlikely that it's going to be very long in the case of an S-expr calculator, but it's highly unnecessary. You want to stop after you found your first match anyway.
  • You're creating a new list. New object creation is costly and again unnecessary.
  • The [0] isn't descriptive at all
There are other ways to get the first occurrence of your item. Here's how I do it:
ctype, value = next((t, v) for t, v in items if v)
Not only is it (a tiny bit) more descriptive, but it's also evaluated lazily and avoids creating a new list.
Thanks for the review rahmu that was very helpful and makes sense. The code was to an extent half-assed. I updated it now to reflect the fixes. Thank you for mentioning doctest, I'll look into it, haven't used it yet.