My main goal in this part is to come up with system tests for the application. To accomplish this I will instrument the application properly and invent a simple scheme that may be used to mimic the user input and the expected output.
I will also convert the Python code evaluation part of the interpreter into a command and come up with a priority based scheme that allows me to define that the evaluation should happen only if nothing else matched before. This should help to get rid of the clash between command aliases and built-in Python functions.
So far we have written quite a few tests and managed produce an implementation matching to them. Using tests we managed to come up with a scaffolding that gives us some confidence in the validity of the code. Of course they do not guarantee that everything works as it should. I'm certain there are aspects that could have been tested better.
There is one crucial part that has not been tested properly yet, the interaction between the user and the application. On retrospect it would have been possible to start by writing these tests first and come up with a design for the application later but I suppose it's ok to write them now. At least we know a bit better what we want now if nothing else. :)
Now the question is which interactions to test? As Placidity already contains functionality to handle math, variables and help I will focus on those now. More may be added later.
It may seem that this sort of testing is duplicate work as each command contains doctests. There is a distinction between the doctests and these sort of system tests, however. These tests are on more general level and deal with multiple commands at once. They also act as a documentation on what you can do with the default system.
I am going to use the following syntax for the tests:
>>> a = 5 >>> b = 10 >>> a + b 15 >>> vars Stored variables: a=5 b=10
So in principle the syntax is very much like in the doctests but a bit friendlier as it handles newlines properly (in my view :) ). The prefix (">>> ") denotes a user input while lines without it denote output. Essentially the format is exactly the same as the one an actual user sees.
The format seems simple enough. I am going to use the existing testing infrastructure (ie. py.test) with a custom component that handles parsing the format, feeding it to the application and fetching the result. As the implementation is quite lengthy, it's available here. Note that the application tests of the current implementation fail on purpose. I will look into implementing the needed changes in the next section.
I created a new class, ScenarioTester, that handles parsing the test scenario and running it. The idea is that an application provides input and output hooks which ScenarioTester overrides with methods of its own that utilize the information parsed from the scenario. This way it can manage the application IO while testing.
It might be a good idea to reuse ScenarioTester for implementation of an enhanced doctest runner later so it's possible to get rid of those pesky doctest.testmod() invocations at the commands. This would offer automagically proper multiline support for output as well.
Separating Python Evaluator from Interpreter
For starters it might be a good idea to prototype Python evaluator as a command. Here's my first sketch:
The file should go amongst the rest of the commands just like before (ie. in /commands/python). There are a couple of interesting points in the implementation:
First of all you can see the priority idea I mentioned about earlier. As the priority is set to "low" I let the command accept any input. I will need to alter the plugin loader and the interpreter to add support for this particular feature.
Secondly I extended the parameter definition of execute to contain "expression". I don't want the possible exceptions raised by "eval" leak through so I made execute return them as strings instead. I will need to alter the interpreter to support the additional parameter.
I will look into making the changes provocated by the changes above next. I'm going to add priority to the plugin loader first and focus on the interpreter after that.
Definition of "Priority"
It might be a good idea to specify better what to mean by priority at this point. Currently it's enough if there's support for minimum of two levels of priority (low and not low). I'm going to define that it may have either "low", "normal" (medium) or "high" value. The priorities are ordered based on their importance (ascending order). A command does not need to contain a priority. By default a priority should be set to "normal".
I could have used some arbitrary number based system for priority but I think the scheme proposed above will do just fine for now. After all the main goal of the design is to loosely constrain the evaluation order of the commands.
Adding "Priority" to the Plugin Loader
Now that the specification for priority is clear, it's relatively easy to come up with tests. I separated each "feature" of plugin loader to a test class of its own and made them pass. I also refactored plugin loader to be more granular. I prefer that each function/method should handle only one task. As the implementation is a bit lengthy, I uploaded the files here.
If you run the tests now, you may notice that three application tests fail. I will look into fixing them next by modifying the interpreter to be up to date with the specification.
As the Python evaluation part has been separated to a command of its own, this is going to change the design of the interpreter a bit. The main benefit of this is that it allows us to get rid of the nasty eval in the interpret method. It also simplifies the interpreter tests somewhat as it allows us to forget some low level detail.
Sadly eval doesn't handle assignment. It sounds like we need another command for this particular aspect, name Assignment. The tests and the code are a bit complicated in this case due to many corner cases as I tried to mimic the semantics of Python. You can find my implementation of assignment here.
Mocking out the exception cases gave me a better idea of how the Python interpreter command should work. Instead of outputting possible errors as string as I stated above, it makes sense to let them be caught later. This makes sense because they may be triggered indirectly (ie. via an invalid assignment in this case). I'm going to adjust the implementation of the Python command to match to this observation next. You can find the fixed version here.
As the needed commands are in place now we can finally focus on making the application tests pass. First of all it's probably a good idea to rethink the interpreter tests to match the new specification set by the observations above.
Considering that we treat assignment and Python evaluation as commands of their own, we no longer need tests for those specific cases at the interpreter level. So let's get rid of those. This in turn makes the abstract class for different test cases redundant. It also allows us to decimate the utils module altogether!
There are a few new things that needed to be tested for. First of all the interface of execute method has been expanded. It should support a new parameter, "expression", as this is required by the Python command. Furthermore it's important it handles cases in which multiple parameters are needed just like in the Python command and assignment. Mocking out these cases should be straightforward.
In addition we need make sure that the commands get evaluated in the right order based on their priority. This calls for yet another test case.
One final aspect to test for is the "null" behavior. Should a command raise an exception, interpreter should just return "null". That should work just fine for now. This leads to a specific test yet again.
You can find the revamped interpreter here. I changed assignment of "execute" method so that it operates on instance instead of a class. Treating it as class method was not a good idea as it broke while testing a case in which the return value of it could be variant. I also got rid of the "self" from the method parameters as it became redundant.
Despite the changes made above, most of the application tests fail still. I will look into making them pass next.
Making Application Tests Pass
Let's look at some simple application test such "test_math". Here's the scenario it contains:
>>> a = 5 >>> b = 10 >>> a + b 15 >>> 15 * 3 45
If you run it ("py.test -k test_math"), you can see that it gives an OutputError pointing to line "b = 10". The error also mentions that the result received was 'null'. As this does not tell much by itself I caught the exception temporarily at the interpreter and returned it instead:
I probably should implement a debug flag for these sort of cases but for now this "hack" is enough. The result given by this is revealing: "'list' has no attribute 'execute'". It looks like there is something going wrong at the assignment command.
The culprit may be found at set_variables function. Apparently "find" method of Commands class does not return a command unlike I expected. Currently there are no any direct tests for the class. I added some and changed it to match the new behavior. You can find the new version here.
Running the math scenario of application tests again ("py.test -k test_math") gives a different kind of error now. The result was None instead of 'null'. Apparently the interpreter works just as expected now. The application is just missing a check for None. Adding the missing check takes us a bit further:
Now it gives "InputError: Expected input but got output instead! Failed at line "15".". The problem seems to be that "a + b" does not give any output. It just returns None that in turn makes it expect another input. I traced the error to the assignment command. Apparently it matches "a + b" even though it shouldn't. You can find the fixed version here.
Running test_math now gives yet another error. It fails to match the output result as it is given as string instead of an integer as expected. This time the issue has to do with the scenario tester. As the scenarios are depicted as strings, matching of interpreter output should be matched against string as well. I altered it to reflect this fact. Updated versions may be found here.
Now it actually runs the math test till it fails with an "IndexError: pop from an empty deque". This problem has to do with the polling loop of the application. The scenario tester simply runs out of inputs. I remedied this problem by making the application catch "SystemExit" outside the loop so that whenever it's raised, the loop is terminated. The solution may be found here.
Running the math test now should work just fine! Only two scenarios left to fix. Let's look at the help scenario next. Apparently it fails because it prints out help for all commands, instead of a specific one.
By looking at the implementation of the help command I noticed that the original idea was that the matches method should return an instance of a command to which it matched and that the interpreter should execute it.
On retrospect that wasn't that good an idea. Instead it's probably clearer to handle specific help as a command of its own and make sure it gets matched before th regular help. This can be accomplished by using the priority system that's in place now. You can find the implementation of specific help here.
Only one scenario left to fix, test_variables. In this case it returns a MatchError. Apparently the interpreter outputs multiline strings all at once. This should be easy to fix by splitting the string and outputting it line by line. Fixed version of the application may be found here.
It took quite a bit of effort but now we have proper application tests that allow us to easily mimic the user input and the application output. It should be relatively easy to come up with new usage scenarios and implement them.
There are still a lot of features to add. It would be nice to offer support for file manipulation (terminal functionality) and provide a graphical user interface for instance.
I have thought about separating the file system module as a separate project as it might be useful beyond this little experiment. I know it's far from optimal (it should access actual file system lazily, only on demand for instance) but with some development it might become useful.
The same applies for a proper doctest runner. It should not be too hard to reuse the file system module and the scenario tester to come up with one. The runner could be extended to contain a regular py.test compatible runner with some added goodies (proper looping, assert preprocessor for nicer output, etc.).
From now on it should be a little bit easier to extend the system as it has been quite well established. I will look into implementing a simple Eliza command in the next part of the series. If you happen to have some ideas regarding the functionality Placidity should have, let me know and I will look into it.
You may find the source code of this part here.