[Exercise] Mini Template Processor

xterm · September 28 2014

Intro

A Template Processor is a software component that given a input and a context renders a output based on that context. The input would contain instructions as well as data that is used by the processor; These instructions are formally known as the Template Language.

Some popular template engines (and languages) can be found here.

Template engines are very common in web development frameworks or as standard libraries, they're also used in every major IDEs or text editors as means to do code generation for projects or snippets of code.

Examples

Let's start by showing some examples and extract the template language we're using from them:

Input:

Hello {{name}}!

Context: name: xterm
Output: Hello xterm!

-

Input:

Hello {{name}}! How's it going {{name}}?

Context: name: xterm
Output: Hello xterm! How's it going xterm?

-

Input:

Please give me {{num_apples}} apples and {{num_oranges}} oranges

Context: num_apples: 5, num_oranges: 3
Output: Please give me 5 apples and 3 oranges

-

In the previous examples you will notice that part of our template language is the ability to substitute {{variable}} with it's corresponding value in the context (which is a feature provided in virtually most languages in either string formatters or string interpolation). Let's add some control flow logic to our template language:

Input:

{% if adult %}
    You're an adult!
{% endif %}

Context: adult: true

If adult is true
Output: You're an adult!

If adult is false
Output: (nothing)

-

Input:

{% for todo in todolist %}
    {{todo}}<br>
{% endfor %}

Context: todolist: ['Buy food', 'Eat food', 'Sleep']
Output:

Buy food<br>Eat food<br>Sleep

-

Exercise

Your task in this exercise is to build a Mini Template Processor using the programming language of your choice. It must be able to render the following instructions:

{{variable}} # for substitutions
{% if cond %}content{% endif %} # for conditionals
{% for foo in bar %}content{% endfor %} # looping construct

In order to maintain the some consistency across the different solutions in different languages, please consider adhering to the following programming interface:

    render(input, context) -> output

Here's sample usage:

    assert render('{{name}}', {'name': 'xterm'}) == 'xterm'
    assert render('{% if value %}sup{% endif %}', {'value': false}) == ''
    assert render('{% for i in nums %}{{i}}{% endfor %}', {'nums': range(10)}) == '0123456789'

P.S.1: The resulting template should obviously maintain the whitespace provided in the input.
P.S.2: Extra context does not err the execution but improperly formatted template does.

Please don't let performance hinder the possibility of exotic solutions, this is meant as a fun exercise.

Bonus Points

Provide a value filter mechanism that you could apply on variables, examples:

{{ name|len }}

would output 5 for a context of name: xterm

{% for num in nums|even %}
    {{ num }}
{% endfor %}

would output 2468 for a context of nums: [1,2,3,4,5,6,7,8,9]

you could optionally provide a parameter to the filter through the following syntax:

{{ variable|filter:param1,param2 }}

the programming interface for the filters is as follows:

valuefilter(value, *params)

venam · October 3 2014

--

Last edited by venam (September 9 2016)

xterm · October 4 2014

Well done venam! The code has some problems but I didn't go as far as to fix them for you, so I leave that up to you. However I did do the following:

1) I built some test fixtures that should be a good base for you to run against your code. Ignore the magical runtime generation of tests and just focus on the outcome. http://pastie.org/9644179
2) I made some changes to your code that makes part of it a little more idiomatic.
3) I modified the code so that it's mostly pep8 compliant (as well as xterm-compliant :-)). If you'd like to see what changed, run a diff against your original file.

Run the tests using either of the following:

$ python test.py

$ python -m unittest tests

The test runner will run all your tests and report a final result. Alternatively, you can provide a switch (-f or --failfast) to make it fail on the first error it finds.

http://pastie.org/9619552

minitemplate.py

import re


def render(template, context):
    template = handle_if(template, context)
    template = handle_for(template, context)
    template = handle_word(template, context)
    return template


def handle_if(template, context):
    end_if = [1]
    start_if = [1]
    while len(end_if) != 0 and len(start_if) != 0:
        # we assume start_if and end_if have the same size
        start_if = [(m.start(0), m.end(0))
                    for m in re.finditer("\{%\s*if\s+\w+\s*%\}", template)]
        end_if = [(m.start(0), m.end(0))
                  for m in re.finditer("\{%\s*endif\s*%\}", template)]
        if len(start_if) == 0 or len(end_if) == 0:
            return template
        min_dist = end_if[0][1] - start_if[0][1]
        _start = start_if[0]

        # find the smallest distance between the first end_if
        # and the list of start_if
        for a in start_if:
            if abs(end_if[0][1] - a[1]) < min_dist:
                _start = a

        # we found the right start for the first endif
        if (find_condition(context, template[_start[0]:_start[1]])):
            if _start[0] == 0:
                template = (template[_start[1]:end_if[0][0]-1] +
                            template[end_if[0][1]:])
            else:
                template = (template[0:_start[0]-1] +
                            template[_start[1]:end_if[0][0]-1] +
                            template[end_if[0][1]:])
        else:
            if _start[0] == 0:
                template = template[end_if[0][1]:]
            else:
                template = (template[0:_start[0]] +
                            template[end_if[0][1]:])
    return template


def handle_for(template, context):
    end_for = [1]
    start_for = [1]

    while len(end_for) != 0 and len(start_for) != 0:
        # we assume start_for and end_for have the same size
        start_for = [(m.start(0), m.end(0))
                     for m in re.finditer("\{%\s*for\s+\w+\s+in\s+\w+\s*%\}",
                                          template)]
        end_for = [(m.start(0), m.end(0))
                   for m in re.finditer("\{%\s*endfor\s*%\}", template)]
        if len(start_for) == 0 or len(end_for) == 0:
            return template
        min_dist = end_for[0][1] - start_for[0][1]
        _start = start_for[0]
        # find the smallest distance between the first end_for
        # and the list of start_for
        for a in start_for:
            if abs(end_for[0][1] - a[1]) < min_dist:
                _start = a
        # we found the right start for the first end_for
        (forName, forList) = parse_for(context, template[_start[0]:_start[1]])
        _replace = []
        for value in forList:
            _ctx = {forName: str(value)}
            _replace.append(
                handle_word(template[_start[1]:end_for[0][0]], _ctx))

        temp = '' if _start[0] == 0 else template[0:_start[0]]

        for v in _replace:
            temp += v
        temp += template[end_for[0][1]:]

        template = temp
    return template


def find_condition(context, text):
    name = re.findall("if\s+(\w+)\s+", text)[0]
    return context[name]


def parse_for(context, forString):
    forList = re.findall("in\s+(\w+)\s*%\}", forString)[0]
    forName = re.findall("for\s+(\w+)\s+in", forString)[0]
    if context[forList]:
        return (forName, context[forList])
    else:
        return (forName, [])


def handle_word(template, context):
    for key, value in context.items():
        template = template.replace("{{"+key+"}}", str(value))
    return template

venam · October 4 2014

--

Last edited by venam (September 9 2016)

raja · October 13 2014

So I was bored today. Thought I'd give this one a shot. This one passes all of xterm's tests(I just banged the whole thing out in one go, tested the "tokenize" function separately a bit and then ran the tests, worked on the first try, I thought I was doing something wrong with the tests at first).

Note, this doesn't cleanly handle cases with malformed expressions:for/if without the corresponding end, or {% %} that contains something other than "for" "if" "endfor" "endif". In the former case it will crash, in the latter it will just drop that entire statement.

Code below:

import re

#Token types
TEXT = "TEXT"
IF = "IF"
FOR = "FOR"
VAR = "VAR"
ENDIF = "ENDIF"
ENDFOR = "ENDFOR"

def tokenize(template):
    token_list = []

    current_token = ""
    i = 0
    while i < len(template):
        n = template.find("{", i)
        if n == -1:
            current_token += template[i:]
            token_list.append((TEXT, current_token))
            current_token = ""
            break

        current_token += template[i:n]
        
        if template[n+1] == "{":
            close = template.find("}}", n+1)
            if close == -1:
                current_token += template[n:n+2]
                i = n+2
                continue

            token_list.append((TEXT, current_token))
            current_token = ""
            var_name = template[n+2:close].strip()
            token_list.append((VAR, var_name))
            i = close+2
            continue

        if template[n+1] == "%":
            close = template.find("%}", n+2)
            if close == -1:
                current_token += template[n:n+2]
                i = n+2
                continue

            token_list.append((TEXT, current_token))
            current_token = ""
            statement_text = template[n+2:close].strip()

            if statement_text.startswith("if "):
                token_list.append((IF, statement_text[3:].strip()))
            elif statement_text.startswith("for "):
                for_text = statement_text[4:].strip()
                parts = [x for x in for_text.split(" ") if len(x) > 0]
                if len(parts) != 3 or parts[1] != "in":
                    # malformed statement, throw it away?
                    print "WARNING: malformed statement expression at:", n
                else:
                    token_list.append((FOR, parts))
            elif statement_text.startswith("endif"):
                token_list.append((ENDIF,None))
            elif statement_text.startswith("endfor"):
                token_list.append((ENDFOR,None))
            else:
                # malformed statement, throw it away?
                print "WARNING: malformed statement expression at:", n
                
            i = close + 2
            
    return token_list

def render(template, context):
    token_list = tokenize(template)
    for_stack = []
    rendered_string = ""
    i = 0
    while i < len(token_list):
        cmd,arg = token_list[i]
        if cmd == TEXT:
            rendered_string += arg
        elif cmd == VAR:
            rendered_string += str(context[arg])
        elif cmd == IF:
            if not context[arg]:
                i = token_list.index((ENDIF,None))
        elif cmd == FOR:
            var_name = arg[0]
            list_name = arg[2]
            for_stack.append((i, list_name, var_name, 0))
            context[var_name] = context[list_name][0]
        elif cmd == ENDFOR:
            token_idx, list_name, var_name, list_idx = for_stack.pop()
            l = context[list_name]
            list_idx += 1
            if list_idx < len(l):
                context[var_name] = l[list_idx]
                i = token_idx
                for_stack.append((token_idx, list_name, var_name, list_idx))
        i += 1
        
    return rendered_string

PS: You should add the following test, as this is a glaringly missing one and it's one I expect many parsers will fail on:

    ('nested for',
     '{% for l in lists %}'
       '{% for x in l %}'
         '{{ x }}'
       '{% endfor %}'
       '\n'
     '{% endfor %}', dict(lists=[[1,2,3], [4,5,6], ["hello ", "world"]]), "123\n456\nhello world\n"),

Last edited by raja (October 13 2014)

xterm · October 13 2014

Added test, thanks raja.

http://pastie.org/9644179

LebGeeks

#1 September 28 2014