Placeholders and SQL injection

It is sad to see how 9X% (or should that be 99.X%?) of SQL applications are riddled with SQL injection bugs.

There really is no excuse for this. Nobody writes code like this:

sub stupid_sum {
    my ($list) = @_;
    my $string = shift @$list;
    for (@$list) {
      $string .= " + " . $_;
    }
    my $sum = eval($string);
    return $sum;
}

Right? Just because our computers use the Von Neumann architecture, where CPU instructions and data is stored in the same memory, does not mean that we cannot distinguish between code and data (ok, so in TeX we do not, but there is a reason TeX is not pleasant to write applications in).

So when we use functions to group our code into logical units, we have this fancy syntax for something called parameters. And we can write clever stuff like this:

int foo(int a) {
    return a + 1;
}

And so the “+” and the “1” are part of the code for the function foo(), while the other value “a” to be added is data which comes from another part of the program. Great stuff!

In fact, in the old days, people were using something called embedded SQL, which tries to keep this distinction for using SQL with another language. Though I have to admit, having used Oracle ProC, that this was quite horrible.

But there is no need for embedding the SQL into the syntax of every language, because these brilliant people invented the placeholder! So now we can also have parameters for SQL, an SQL string holds the code to be executed, and placeholders supply the values to be used from other parts of the program.

And it is so easy. No need for mysql_real_quote_string() and other horrors. Just do like this, using Perl and DBI as example:

sub mark_items {
    my ($dbh, $mark, @keys_list) = @_;
    my $placeholders = join(",", map("?", @keys_list));
    $dbh->do("UPDATE t SET flag = ? WHERE id IN (" . $placeholders . ")",
             undef,
             $mark, @keys_list);
}

And then you can sleep well at night not worrying about which kind of values are passed to your SQL, or whether “+” can maybe format your hard disk if someone passes in the wrong argument.

Just because modern dynamic languages make string concatenation easy, does not mean that confusing code and data is a good thing. Von Neumann architecture is good for CPUs, but at higher levels of abstraction we have moved on.

As Bjarne Stroustrup often says, just as plumbers need an education to be allowed to mess up pipes, why doesn’t a programmer need an education that makes him or her understand these things before being allowed to release software?

6 comments

  1. Stroustrup

    > As Bjarne Stroustrup often says, just as plumbers need an education to be allowed to mess up pipes, why
    > doesn’t a programmer need an education that makes him or her understand these things before being
    > allowed to release software?

    This is my favorite Stroustrup quote, actually. I mean, just look at his language. Oh man. 🙂

    But, btw, I agree with you, however it might’ve sounded like on drizzle-discuss 😉

    best,
    -k

  2. “It is sad to see how 9X% (or should that be 99.X%?)”

    Nah, it isn’t that bad…it’s more like one out of 10 or 20 (which is still quite bad of course)

  3. Placeholders for data only

    I’m an advocate of using parameter placeholders too, but to be fair we should acknowledge that parameters can be used only in place of constant data values in SQL expressions. If you have any other part of your query that you need to be dynamic, you still have to resort to string interpolation. For example, if you want to ORDER BY a user-selectable column.

    The best way to avoid SQL injection is to validate external input as strictly as possible, and then escape it as you interpolate into the SQL string. Many database API have a function to escape data values such as mysql_real_escape(), but I don’t usually see functions for applying delimiters for table or column identifiers.

    1. Re: Placeholders for data only

      There is no need to quote user-selectable columns. Instead do this:

      my $columns = { 1 => “create_dt”, 2 => “type”, 3 => “username” };
      die unless exists($columns->{$user_choice});
      my $sort_column = $columns->{$user_choice}”;
      my $sql = “SELECT * FROM t ORDER BY $sort_column”;

  4. Using the eval() fuction

    Placeholders are great for inserting tainted data in code before evaluation. And for writing SQL queries this is often all you need to getting the work done.

    In other cases being able to generate code, compile it, and run it on the fly (as in you stupid_sum) is a quite powerful technique. But of course you have to make sure that data tainted by outside users doesn’t leak into unwanted places. But as Stroustrup implies, you have to know what you’re doing.

    1. Re: Using the eval() fuction

      Agree.

      In Perl terms, my point can be expressed succinctly as follows:

      Don’t use eval “…” when you can use eval { … }

      🙂

Leave a Reply to knielsen Cancel reply

Your email address will not be published. Required fields are marked *