{"id":22,"date":"2006-09-12T21:01:00","date_gmt":"2006-09-12T21:01:00","guid":{"rendered":"https:\/\/knielsen-hq.org\/w\/?p=22"},"modified":"2021-08-31T16:21:21","modified_gmt":"2021-08-31T16:21:21","slug":"partitioned-archive-tables","status":"publish","type":"post","link":"https:\/\/knielsen-hq.org\/w\/partitioned-archive-tables\/","title":{"rendered":"Partitioned archive tables"},"content":{"rendered":"\n<p>Is there anyone using partitioned archive tables in MySQL 5.1 for storing logs (or other voluminous data)?<\/p>\n\n\n\n<p>Storing large amounts of logs in a relational database can bring special challenges.<\/p>\n\n\n\n<p>Logs can take up huge amounts of space on disk, and while disk space is cheap, disk I\/O can be expensive, performance-wise. But many logs compress really well, and for this the MySQL archive storage engine (insert-only, no indexes, gzip&#8217;ed storage) can be used.<\/p>\n\n\n\n<p>You often want to scan across a few hours or days worth of logs, and indexes are poor for this purpose as the large number of disk seeks can kill performance. And full table scans of years of logs is not all that much fun either. For this, table partioning (supported in MySQL 5.1) is very useful. Store each day or week of logs in a separate table partition, partitioned on day (maybe use the 5.1 event sceduler, or simply a cron job, to add new partitions automatically). Then efficient full-partition scans of just the days or weeks in question can be used.<\/p>\n\n\n\n<p>So why not combine both approaches? MySQL 5.1 supports partitioned archive tables just fine:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  CREATE TABLE mylog (unix_time INT, msg VARCHAR(1000))\n  ENGINE=archive\n  PARTITION BY RANGE(unix_time)\n  (PARTITION p0 VALUES LESS THAN (UNIX_TIMESTAMP('20060902000000')),\n   PARTITION p1 VALUES LESS THAN (UNIX_TIMESTAMP('20060903000000')));\n<\/pre>\n\n\n\n<p>This has the potential to really boost performance of relational log management. Very low space usage, and still full SQL query access to the data, with good performance.<\/p>\n\n\n\n<p>A quick test confirms this (after adding some more partitions). First load some rows:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  perl -MDBI -e '$dbh=DBI-&gt;connect(\"dbi:mysql:\"); $dbh-&gt;do(\"INSERT INTO mylog values($_, \\\"log message on $_\\\")\") for (1157148000 .. 1157148000 + 86400*5 - 1)'\n<\/pre>\n\n\n\n<p>Only about 10 bytes used per row, great (archive storage engine doing it&#8217;s bit)! And queries in limited date ranges are faster:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">mysql&gt; select * from mylog where msg LIKE '%71234%';\n+------------+---------------------------+\n| unix_time  | msg                       |\n+------------+---------------------------+\n| 1157171234 | log message on 1157171234 |\n| 1157271234 | log message on 1157271234 |\n| 1157371234 | log message on 1157371234 |\n| 1157471234 | log message on 1157471234 |\n| 1157571234 | log message on 1157571234 |\n+------------+---------------------------+\n5 rows in set (2.15 sec)\n\nmysql&gt; select * from mylog where unix_time &gt;= unix_timestamp('20060903100000') and unix_time &lt; unix_timestamp('20060903105959') and msg LIKE '%71234%';\n+------------+---------------------------+\n| unix_time  | msg                       |\n+------------+---------------------------+\n| 1157271234 | log message on 1157271234 |\n+------------+---------------------------+\n1 row in set (0.66 sec)\n<\/pre>\n\n\n\n<p>So <a href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.1\/en\/partitioning-pruning.html\">partition pruning<\/a> is working as expected.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Is there anyone using partitioned archive tables in MySQL 5.1 for storing logs (or other voluminous data)? Storing large amounts of logs in a relational database can bring special challenges. Logs can take up huge amounts of space on disk, and while disk space is cheap, disk I\/O can be expensive, performance-wise. But many logs&hellip; <a class=\"more-link\" href=\"https:\/\/knielsen-hq.org\/w\/partitioned-archive-tables\/\">Continue reading <span class=\"screen-reader-text\">Partitioned archive tables<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[4],"_links":{"self":[{"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/posts\/22"}],"collection":[{"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/comments?post=22"}],"version-history":[{"count":2,"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/posts\/22\/revisions"}],"predecessor-version":[{"id":26,"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/posts\/22\/revisions\/26"}],"wp:attachment":[{"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/media?parent=22"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/categories?post=22"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/tags?post=22"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}