This is a software development story from early in my career, some time in the late 1990s. The company in question went bust many years ago, so it’s probably safe to tell the tale now.1
parseInt weren’t implemented. There didn’t seem to be any string methods. Arrays were missing. Object prototypes didn’t work. At least the core syntax was working, but it did seem very slow. Very slow. I mean, interpreted code can easily be a hundred times slower than compiled code, but it shouldn’t take several seconds to evaluate something like
for (i=0; i<1000; ++i);.
Clearly the most important thing was to figure out what was going on inside the interpreter. So I started to read the code. And all became clear:
The failure to store a represention of the parsed code was disastrous because it meant that in order to evaluate an expression or statement for a second time, the engine had to parse it a second time. And if a loop executed a thousand times, then the engine had to parse the loop body (and termination condition) a thousand times. No wonder it was slow!
But how was I going to fix this disaster of an engine in the two weeks I had available? Normally under this kind of deadline it’s imperative that you work with what you’ve got, because if you try to rewrite from scratch then there’s a risk that you’ll end up with nothing. But sometimes the code is so bad that there’s nothing for it but to throw everything away and start again. Even if it’s the CTO who wrote it.2
Now, if there was one thing that I had been taught in my computer science degree, it was how to quickly implement a programming language. You take two very powerful tools:
and use these to build your parser. Then you use the syntax-directed translation technique to turn the output of the parser directly into stack-oriented byte code without having to represent the parse tree as a data structure. (This works neatly for context-free language syntaxes because each expression in the parse tree corresponds to pushing a value onto the stack in the byte code.) Finally, you need an engine to evaluate the byte code, but you can get away with only a handful of byte code instructions, and so the actual engine can be very small.
Put methods on the classes you’re using for your object representations. Here’s a typical section from the standard:
When the [[Get]] method of O is called with property name P, the following steps are taken:
- If O doesn’t have a property with name P, go to step 4.
- Get the value of the property.
- Return Result(2).
- If the [[Prototype]] of O is null, return undefined.
- Call the [[Get]] method of [[Prototype]] with property name P.
- Return Result(5).
In the course of implementing the language semantics, I realized that the original implementation (the one that I had thrown away) was hopelessly wrong in many cases. It had obviously been reverse-engineered from Flanagan’s guide, and not implemented directly from the standard, and consequently many details, like step (4) in the quote above, were missing or broken. It would have taken far longer than I took to write my implementation just to find and fix these kinds of errors in the semantics.
↩ The title comes from the first edition of Fred Brooks’ The Mythical Man-Month:
In most projects, the first system built is barely usable. It may be too slow, too big, awkward to use, or all three. There is no alternative but to start again, smarting but smarter, and build a redesigned version in which these problems are solved […]
The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. The only question is whether to plan in advance to build a throwaway, or to promise to deliver the throwaway to customers. Seen this way, the answer is much clearer. Delivering that throwaway to customers buys time, but it does so only at the cost of agony for the user, distraction for the builders while they do the redesign, and a bad reputation for the product that the best redesign will find hard to live down.
Hence plan to throw one away; you will, anyhow.
↩ I remain, to this day, baffled as to how it came to be written like this. Parsing the source code again on each time around a loop!
↩ This was in the days of 32-bit systems with small swap files, so you could actually run out of memory. Today such a program would simply allocate more and more swap, getting slower and slower but not actually quite running out.
↩ It was two weeks before this happened. When I did come to write a garbage collector I used the mark-and-sweep algorithm. But something puzzled me, and I couldn’t find an answer in any of the textbooks I looked at: how was I supposed to schedule the collections? In a classic description of a garbage collector, you wait until memory runs out and then you collect the world. But this leads to bad behaviour on modern systems, because of swapping, and because other processes need memory too. You need to schedule collections well before memory is full. But when exactly? I still don’t know of a comprehensive solution to this problem.