{"id":87,"date":"2008-11-27T23:51:00","date_gmt":"2008-11-27T23:51:00","guid":{"rendered":"https:\/\/knielsen-hq.org\/w\/?p=87"},"modified":"2021-09-01T11:00:42","modified_gmt":"2021-09-01T11:00:42","slug":"selecting-rows-holding-group-wise-maximum-of-a-field-part-two","status":"publish","type":"post","link":"https:\/\/knielsen-hq.org\/w\/selecting-rows-holding-group-wise-maximum-of-a-field-part-two\/","title":{"rendered":"Selecting rows holding group-wise maximum of a field, part two"},"content":{"rendered":"\n<p>Selecting rows holding group-wise maximum is a favorite problem of mine, but one which only rarely pops up. But for some reason, after my <a href=\"\/w\/selecting-rows-holding-group-wise-maximum-of-a-field\/\" data-type=\"post\">last blog post<\/a> on the subject, it seems to be mentioned almost daily around here.<\/p>\n\n\n\n<p>Something that I forgot to mention in the previous post is that most of the examples there assume suitable indexing is available to get decent performance. Basically a composite index on both the column(s) in the GROUP BY and the column over which MAX is computed is needed. In the example I gave, such an index is available throught the primary key.<\/p>\n\n\n\n<p>However, such an index may not be available in all cases. Maybe maintaining it would be too expensive, or maybe the data the max is computed over is itself the result of a (sub-)query, and no indexing is available. So it is worth it also to understand this case, as the performance of the different possible queries differ greatly from the indexed case.<\/p>\n\n\n\n<p>So let us modify the original example to not have any useful indexes:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE TABLE object_versions (\n  id INT PRIMARY KEY AUTO_INCREMENT,\n  object_id INT NOT NULL,\n  version INT NOT NULL,\n  data VARCHAR(1000)\n) ENGINE=InnoDB;\n<\/pre>\n\n\n\n<p>This time, I will use a data set of size only 1% of the previous example, as without indexes some of the queries get ridiculously poor performance. So let us take 10,000 rows, 1000 object each with 10 versions. I use this Perl long^H^H^H^Hone-liner to load the data:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">perl -MDBI -le '$vers=10; $groups=1000; $dbh=DBI-&gt;connect(\"dbi:mysql:\", \"test\",\n\"pass\", {RaiseError =&gt; 1}); $dbh-&gt;do(\"USE test\"); foreach $o (1..$groups)\n{ $dbh-&gt;do(\"INSERT INTO object_versions VALUES \" . join(\", \", map(\"(null, ?,?,?)\",\n1..$vers)), undef, map( ($o, $_, \"data_${o}_$_\"), 1..$vers)); }'\n<\/pre>\n\n\n\n<p>(Yes, I know&#8230; but I have a strange love for Perl one-liners).<\/p>\n\n\n\n<p>Here are the results:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">mysql&gt; SELECT data FROM object_versions o1\nWHERE version = (SELECT MAX(version) FROM object_versions o2 WHERE o1.object_id = o2.object_id);\n1000 rows in set (25.55 sec)\n\nmysql&gt; SELECT o1.data FROM object_versions o1\nINNER JOIN (SELECT object_id, MAX(version) AS version FROM object_versions GROUP BY object_id) o2\nON (o1.object_id = o2.object_id AND o1.version = o2.version);\n1000 rows in set (0.72 sec)\n\nmysql&gt; SELECT o1.data FROM object_versions o1\nLEFT JOIN object_versions o2 ON o1.object_id = o2.object_id AND o1.version &lt; o2.version\nWHERE o2.object_id IS NULL;\n1000 rows in set (15.44 sec)\n\nmysql&gt; SELECT MAX(CONCAT(version, \":\", data)) FROM object_versions GROUP BY object_id;\n1000 rows in set (0.52 sec)\n\nmysql&gt; SELECT data FROM\n(SELECT * FROM object_versions ORDER BY object_id DESC, version DESC) t GROUP BY object_id;\n1000 rows in set (0.04 sec)\n<\/pre>\n\n\n\n<p>The only query that has any kind of decent performance here is last one using the &#8220;evil trick&#8221; of abusing the MySQL GROUP BY extensions in a way that is explicitly documented to not produce well-defined results. Which is really sad, since it is the only way I know of of getting the database to make the obvious execution plan in this case, which is to simply sort the data on the GROUP BY expression, and then loop over the rows picking the max row in each group on the way.<\/p>\n\n\n\n<p>In fact, in many cases I think a decent alternative is to just select <em>all<\/em> rows into the client using ORDER BY, and do the aggregation there.<\/p>\n\n\n\n<p>Now I just need someone to implement my <code>SELECT MAX(object_id, version)<\/code> &#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Selecting rows holding group-wise maximum is a favorite problem of mine, but one which only rarely pops up. But for some reason, after my last blog post on the subject, it seems to be mentioned almost daily around here. Something that I forgot to mention in the previous post is that most of the examples&hellip; <a class=\"more-link\" href=\"https:\/\/knielsen-hq.org\/w\/selecting-rows-holding-group-wise-maximum-of-a-field-part-two\/\">Continue reading <span class=\"screen-reader-text\">Selecting rows holding group-wise maximum of a field, part two<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[4,27],"_links":{"self":[{"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/posts\/87"}],"collection":[{"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/comments?post=87"}],"version-history":[{"count":3,"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/posts\/87\/revisions"}],"predecessor-version":[{"id":90,"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/posts\/87\/revisions\/90"}],"wp:attachment":[{"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/media?parent=87"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/categories?post=87"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/knielsen-hq.org\/w\/wp-json\/wp\/v2\/tags?post=87"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}