Partitioned archive tables

Is there anyone using partitioned archive tables in MySQL 5.1 for storing logs (or other voluminous data)?

Storing large amounts of logs in a relational database can bring special challenges.

Logs can take up huge amounts of space on disk, and while disk space is cheap, disk I/O can be expensive, performance-wise. But many logs compress really well, and for this the MySQL archive storage engine (insert-only, no indexes, gzip’ed storage) can be used.

You often want to scan across a few hours or days worth of logs, and indexes are poor for this purpose as the large number of disk seeks can kill performance. And full table scans of years of logs is not all that much fun either. For this, table partioning (supported in MySQL 5.1) is very useful. Store each day or week of logs in a separate table partition, partitioned on day (maybe use the 5.1 event sceduler, or simply a cron job, to add new partitions automatically). Then efficient full-partition scans of just the days or weeks in question can be used.

So why not combine both approaches? MySQL 5.1 supports partitioned archive tables just fine:

  CREATE TABLE mylog (unix_time INT, msg VARCHAR(1000))
  ENGINE=archive
  PARTITION BY RANGE(unix_time)
  (PARTITION p0 VALUES LESS THAN (UNIX_TIMESTAMP('20060902000000')),
   PARTITION p1 VALUES LESS THAN (UNIX_TIMESTAMP('20060903000000')));

This has the potential to really boost performance of relational log management. Very low space usage, and still full SQL query access to the data, with good performance.

A quick test confirms this (after adding some more partitions). First load some rows:

  perl -MDBI -e '$dbh=DBI->connect("dbi:mysql:"); $dbh->do("INSERT INTO mylog values($_, \"log message on $_\")") for (1157148000 .. 1157148000 + 86400*5 - 1)'

Only about 10 bytes used per row, great (archive storage engine doing it’s bit)! And queries in limited date ranges are faster:

mysql> select * from mylog where msg LIKE '%71234%';
+------------+---------------------------+
| unix_time  | msg                       |
+------------+---------------------------+
| 1157171234 | log message on 1157171234 |
| 1157271234 | log message on 1157271234 |
| 1157371234 | log message on 1157371234 |
| 1157471234 | log message on 1157471234 |
| 1157571234 | log message on 1157571234 |
+------------+---------------------------+
5 rows in set (2.15 sec)

mysql> select * from mylog where unix_time >= unix_timestamp('20060903100000') and unix_time < unix_timestamp('20060903105959') and msg LIKE '%71234%';
+------------+---------------------------+
| unix_time  | msg                       |
+------------+---------------------------+
| 1157271234 | log message on 1157271234 |
+------------+---------------------------+
1 row in set (0.66 sec)

So partition pruning is working as expected.

6 comments

  1. I’m using table partitioning on archive and myisam tables in a data warehouse. Can’t wait for the MySQL 5.1 tree to become ‘Generally Available’.

    1. I’m using MyISAM table for the reason you’ll see later. You may think why do not you simply rebuild table by using OPTIMIZE TABLE.

  2. Are you by any chance from Denmark? My name is Jared, and I’m searching for a friend with whom I’d lost touch with whose name is Kristian Nielsen. If you are, please respond here so that I can catch up with you again! If not… well, I’m really sorry for this post 🙂 Thanks!!!

    1. Sorry, but I thought I might include more detail since I realize there can be more than 1 Kristian Nielsen from Denmark. The Kristian I’m searching for has a son name Jonas and an ex-wife named Lillan. He works with computers and programming.

Leave a comment

Your email address will not be published. Required fields are marked *