Round trip


The story so far: I secretly rewrote a JavaScript engine from the ground up for a web development tools startup. Several commenters on the last entry were on tenterhooks to know what happened next: did the company adopt my engine or not? Well, I’m sorry to disappoint you: things weren’t anything like as dramatic as I suspect you may be imagining. What actually happened was this: I checked the new engine into CVS, the beta release came out, the engine got a little bit of testing, and then it was shipped. Everyone was too busy to pay very much attention.

The company’s product was a web server application development environment that came with a number of C++ application programming interfaces: for accessing the file system, for user authentication, for processing web forms, for updating and querying the server’s database, and so on. It was my job to implement each of these APIs in the company’s JavaScript engine, and as I implemented each API, I did my best to balance these three requirements:

  1. consistency with the C++ versions of the APIs;
  2. consistency with the JavaScript language and its native functions; and
  3. convenience and ease of use.

Usually this was a straightforward balancing act, but in the case of the database query API it was not so easy. Imagine that you want to query the products table and select all records in the “book” category with “algorithms” somewhere in the product description and at least one item in stock. If you were issuing a SQL query, then you’d write:

SELECT * FROM products
WHERE category = 'book' AND desc LIKE '%algorithms%' AND stock >= 1;

But in the company’s C++ API you had to write:1

records = db::select(

This interface seemed to me to be horribly inconvenient to use: most likely the programmer started out with a SQL query in mind, which they had to laboriously transform into the sequence of API calls given above, only to have the API concatenate the result back into the SQL they were originally thinking of.

I suggested that this wasteful round trip could be avoided by adding an API call taking a SQL query and passing it straight through to the database. But my suggestion was firmly rejected. At this distance, I don’t remember how the developers justified their interface. Possibly the idea was to make it impossible to write code that was vulnerable to a SQL injection attack. That would have been a good reason, but I don’t think it can have been the reason they gave, because of the date: this article suggests that SQL injection attacks only started to be discussed publically in 1998, which would have been a year or so too late to have influenced the design of the API.

But I wasn’t defeated: I had a plan B, in the form of Lex and Yacc. If I could use these tools to write one parser, I could use them to write another!2 In particular, a parser for the subset of the SQL language that can appear in a WHERE clause in a SELECT statement. As the parser matched each clause, it output the appropriate calls to the db:: functions, effectively using the database API to build a parse tree:

/* expression syntax */
  INTEGER        { $$ = db::integer($1); }
| FIELD          { $$ = db::field($1); }
| STRING         { $$ = db::string($1); }
| '(' exp ')'    { $$ = $1; }
| exp '=' exp    { $$ = db::compare($1, db::equal, $2); }
| exp '>=' exp   { $$ = db::compare($1, db::greater_than_or_equal, $2); }
| exp 'LIKE' exp { $$ = db::compare($1, db::like, $2); }
| exp 'AND' exp  { $$ = db::intersection($1, $2); }
| ...

With this parser in place, the SQL query I gave above would be implemented like this in JavaScript:"products", "category = 'book' AND desc LIKE '%algorithms%' AND stock >= 1")

with the SQL query being parsed into the tree of calls to the database API, which would then be serialized back into (approximately) the same SQL that you started with, before being sent to the database.

This interface was so much more convenient3 than the C++ API that it single-handedly drove the adoption of JavaScript within the company, and within a few weeks everyone working on anything involving database queries had switched from C++ to JavaScript.

  1.  I’m relying on my memory here, so the details are invented. But I believe it gives an accurate impression of the complexity and inconvenience of the interface.

  2.  The original Lex and Yacc used global variables to maintain state, making it tricky to put two parsers in the same program,4 but I was actually using Flex and Bison, which have support for multiple parsers.

  3.  However, with the benefit of hindsight I can see that it was this kind of convenient interface that made it easy for people to write code that was vulnerable to SQL injection attacks.

  4.  But not impossible: with suitable #defines you can use the C preprocessor to rename the globals in different compilation units.