• Coding
  • [Exercise] Basic CSS Rules Validator

Build a basic CSS Rules Validator that validates the content of a CSS file based on the following rules:
SELECTOR1 { 
    PROPERTY1: VALUE;
    PROPERTY2: VALUE;
}

SELECTOR2 { 
    PROPERTY1: VALUE;
    PROPERTY2: VALUE;
}
Example of valid CSS rules:
body {
    background-color: #FF0000;
}

.foo {
    color: Black;
    border-style: solid;
}
Example of invalid CSS rules:
body {{
    background-color: #FF0000
}

.foo {
    color: Black
    border style: solid;
}
P.S.: Make it extensible enough so that the next exercise can enhance the rules validator further.
I think this can be done with a single regular expression.
Yet I'm not so sure, and anyway it will be painful. Maybe a combination...
Anyway in your valid CSS code example, the 1st like has no semi-column. Is that valid per your specs? I don't think it's valid CSS. And what about newlines, do they have to be respected, or can everything be on a single line?
You're right, i fixed the semicolon.

I don't think there's a problem with newlines, as everything is delimited accordingly. So, no you don't have to respect new lines, it was just for presentation purposes.

Edit: Thinking about it further, new lines can be helpful when debugging in which the validator can provide an invalid line number. Otherwise it'll always be "Line 1 invalid"
You need to keep in mind that this is a startup exercise, it would grow, that's why i mentioned that the implementation should be flexible enough for enhancements.
What about indentation? Are whitespaces in front of properties accountable for?
No, same as new lines.

Whitespace is not an issue.
Well, here's my solution to the problem(signed up to post it :) ):

It's in python(currently using the 2.x syntax but the conversion to 3.x is simple, just use print() instead of print)
class ParseException(Exception):
    """
    Convenient way to indicating errors to be reported to the user
    """
    def __init__(self, reason, line):
        self.reason = reason
        self.line = line

    def __str__(self):
        return "Parse error on line %d: %s" % (self.line, self.reason)

def parse_string(text):
    #TODO do this without loading the entire file into memory
    """
    State machine that splits up the incoming text into a logical form
    that can be more easily processed later

    Returns a dict which maps selectors into their bodies. Selectors
    can be any string, whitespace(including newlines) before or after
    elements(selectors, property, value) are ignored. The body of a
    selector consists of a list of dicts which describe each property
    affecting this selector.

    Some metadata is included such as which line things occur on for
    better error reporting later.

    May error out if the file is malformed at a very basic level, such
    as misplaced or mismatched braces. The selector and body syntax
    itself is not checked this is delegated to users of this function
    """

    # possible states:
    # selector: we start out here and enter the body for an opening
    #           brace we come back to it after a closing brace
    #
    # key: we enter here after an opening brace in a selector. we exit
    #      upon reading a ":" character
    #
    # value: we enter from reading_key after a ":" character. we exit
    #        either at "}" or ";". I'm assuming it's legal for the
    #        last(or single) statement in a css body not to be
    #        terminated by a semi-colon(not semi-column :P)
    
    state = "selector"
    result = {}
    
    buf = ""
    cur_line = 1
    cur_selector = None
    cur_key = None
    
    for char in text:
        if char == "{":
            if state != "selector":
                raise ParseException("illegal '{' inside a selector body", cur_line)

            cur_selector = buf.strip()
            buf = ""
            if cur_selector not in result:
                result[cur_selector] = []
            state = "key"
            
        elif char == "\n":
            # All end of lines show up as "\n" even on windows as long
            # as the file is opened in text mode
            cur_line += 1
            buf += char
            
        elif char == "}":
            if state == "selector":
                raise ParseException("illegal '}' inside a selector", cur_line)
            if state == "key" and buf.strip() != "":
                raise ParseException("illegal '}' inside a property", cur_line)
            
            if buf.strip() != "":
                # there's a dangling key:value that hasn't been
                # inserted yet
                
                # TODO line reporting here isn't very accurate
                # (multiline property definition?), consider a better
                # strategy maybe?
                result[cur_selector].append({'property':cur_key,'value':buf, 'line':cur_line})
                
            buf = ""
            state = "selector"

        elif char == ";":
            if state != "value" or buf.strip() == "":
                raise ParseException("Illegal ';'", cur_line)
            result[cur_selector].append({'property':cur_key, 'value':buf.strip(), 'line':cur_line})
            buf = ""
            state = "key"

        elif char == ":":
            if state != "key" or buf.strip() == "":
                raise ParseException("Illegal ':'", cur_line)
            state = "value"
            cur_key = buf.strip()
            buf = ""

        else:
            buf += char
    return result

if __name__ == "__main__":
    import sys,pprint
    pp = pprint.PrettyPrinter(indent=4)
    if len(sys.argv) < 2:
        print "Usage: %s <file to parse> [<other file> [...]]" % sys.argv[0]

    for fname in sys.argv[1:]:
        print "Parsing %s" % fname
        try:
            pp.pprint(parse_string(open(fname).read()))
        except ParseException, e:
            print "Invalid!"
            print e
        except IOError,e:
            print "Cannot read file!"
            print e
Here is the result when run on all three examples:
$ python cssvalidator.py test_file.css test_file2.css test_file3.css 
Parsing test_file.css
{   'SELECTOR1': [   {   'line': 2, 'property': 'PROPERTY1', 'value': 'VALUE'},
                     {   'line': 3, 'property': 'PROPERTY2', 'value': 'VALUE'}],
    'SELECTOR2': [   {   'line': 7, 'property': 'PROPERTY1', 'value': 'VALUE'},
                     {   'line': 8, 'property': 'PROPERTY2', 'value': 'VALUE'}]}
Parsing test_file2.css
{   '.foo': [   {   'line': 6, 'property': 'color', 'value': 'Black'},
                {   'line': 7, 'property': 'border-style', 'value': 'solid'}],
    'body': [{   'line': 2,
                    'property': 'background-color',
                    'value': '#FF0000'}]}
Parsing test_file3.css
Invalid!
Parse error on line 1: illegal '{' inside a selector body
This parser essentially massages the input into a form which is easy to later validate in other ways(specific rules about valid selector/property/value specifications for example). It only checks for very basic validity itself.

I ran it on a pretty large css file from a project I'm currently working on and it worked just fine. If anyone has any examples which break my code(either by getting it to error out on a valid input or not do so on invalid input) let me know :)
Raja,

Thank you very much for posting this, you actually made my day saying thatmyou signed up to post it. I do expect however that you stick around since we do this quite a lot.

Just make sure you introduce your self in the members introduction topic!

Your solution is interesting, your documentation is outright awesome and your TODOs are right on the spot!
Well, sorry if this ends up ruining your day, but apparently I was already registered though under a different name way back in 2009. This is me: http://www.lebgeeks.com/forums/viewtopic.php?id=5245

I guess the forum software didn't catch it as I'm using a different email now. Think I should re-post in the introductions?
hey there, nice solution!
TODO do this without loading the entire file into memory
If I understand correctly, you just have to replace
pp.pprint(parse_string(open(fname).read()))
with
pp.pprint(parse_string(open(fname)))
Since a file is its own iterator, you don't need to call the read() function.
Not necessarily, I added that TODO as a quick afterthought but I think if you iterate over a file you get lines not characters. So it might be a bit more involved than that. I'll have to check and get back to you

Edit: Indeed, I just tried it and iterating over a file directly will yield lines and not single characters.

The following code could be used though to wrap it:
def file_wrapper(fname):
  fh = open(fname)
  while True:
    char = fh.read(1)
    if char == '':
      break
    yield char
And then you get:
pp.pprint(parse_string(file_wrapper(fname)))
That would work, but given that this only matters for really large files where loading them into memory would be a problem and since we're talking CSS files here that's quite unlikely(I doubt there are CSS files that are >100MB in size, that would be stupid).