I came by a series that aims to build an ECMAScript interpreter in Python yesterday. As a homework the author offered a simple task of of implementing a tokenizer. I haven't implemented one earlier so I thought it might be fun to come up with one.
What is a tokenizer then? Basically it evaluates given input string and returns an evaluated result that contains some metadata. Instead of trying to complicate things further, it's probably easier if you take a look at my simple (hopefully!) implementation. I tried to keep it as readable as possible, really. :) Anyway, here we go:
It's probably far from perfect (pointers are welcome) but appears to work just fine with in the given cases. I found the development work really straightforward, thanks to test driven approach. I started out with simple tests and slowly expanded the test suite further. Later on it should be expanded to support more complicated cases.