Using MASTER_GTID_WAIT() to avoid stale reads from slaves in replication

I have just implemented MASTER_GTID_WAIT() in MariaDB 10.0. This can be used to give a very elegant solution to the problem of stale reads in replication read-scaleout, without incuring the overheads normally associated with synchronous replication techniques. This idea came up recently in a discussion with Stephane Varoqui, and is similar to the concept of Lamport logical clock described in this Wikipedia article.

I wanted to describe this, hoping to induce people to test and maybe start using this, as it is a simple but very neat idea, actually.

A very typical use of MariaDB/MySQL replication is for read-scaleout. The application does all updates against a single master, which replicates to a set of slaves. The application can then distribute its reads among the slaves. If most of the database load is from reads, then this is an effective way to scale beyond what a single database server can handle.

The problem of stale reads occurs since MariaDB/MySQL replication is asynchronous. There is some delay from the commit of a change on the master server until that change becomes committed and visible on a slave server. This problem can be quite annoying in many cases, eg. user changes some preferences and reloads the page (read goes to a different slave server which is not caught up); now the change looks as if it was lost.

The idea is to use the MariaDB Global Transaction ID (GTID for short) as a logical clock. When we do an update on the master, we can obtain the associated GTID from @@LAST_GTID. Then on the slave, we can execute a MASTER_GTID_WAIT(<GTID from master>, <timeout>) before doing a query for which read consistency is critical; this will ensure that the slave will catch up before the query is run, and that no stale read results. Or if we get a timeout, we can try another slave server, which hopefully is not lagging as much behind.

Typically, we would pass around the GTID (the logical clock value) between different parts of the system to maintain consistency where it is needed and avoid stale reads. Eg. in a web application, we would set a cookie with the GTID when an update is made to the database. Then on the next HTTP request, we have the GTID which is needed in MASTER_GTID_WAIT(), even if it goes to a completely different web server. Or if we send an email, we can encode the GTID in any contained link, to be sure that it will work whichever slave server it might end up in, possibly in a different data center on a different continent.

By passing around the GTID whenever we communicate between parts of the system, we can ensure that if transaction A is known to have occured before transaction B, then any change made by A will also be visible to B. If there was no communication (direct or indirect), then B cannot know that A came before – and if there was communication, we can avoid stale reads by passing the GTID as part of that communication.

The great thing about this technique is that it is optional. We can use it for just the queries where avoiding stale reads is critical, and only there take the associated penalty (in the form of increased latency). Other parts of the database or application will not be negatively affected. This is much more flexible than using some form of synchronous replication, which will incur some penalty for all parts of the system.

So for applications that require more performance than what is currently possible to get from fully synchronous replication, we can choose consistency in a few critical places where it matters, and go for the performance of asynchronous replication in the rest.

I am hoping that some users will start to do tests with this, and I am very eager to hear any feedback and suggestions for further improvements (the maria-developers@ list is a good place for those). I think this can become a very useful technique in many cases.

By the way, note that it is not necessary to actually switch the replication to GTID to use this technique, as in MariaDB the GTIDs are always generated and replicated, even if they are not used for the slave when connecting to the master. This should make it simple for people to start testing. One will need to run MariaDB 10.0 on both the master and the slave, of course – but this could be achieved by setting up one MariaDB server as a slave of the main system (which could be running MySQL or Percona Server or MariaDB 5.5 or whatever), and then setting up another MariaDB 10.0 slave using the first slave as master.

If this turns out to be something that people find generally useful, it would be interesting to integrate this more tightly into the client protocol and client API. The @@LAST_GTID could be sent automatically back with the query results, the same way that @@LAST_INSERT_ID is done. And another option could enable the automatic execution of the MASTER_GTID_WAIT() call before queries sent to the slave servers. This could make the technique almost transparent for the end application, and could help avoid the overhead of extra client-server roundtrips and extra SQL parsing.

However, that is only worth doing if this turns out to be something that will actually be shown to be useful in practice. So if you want to see that happen eventually, then get testing and send in that feedback!

The code is available in the 10.0 bzr tree now, and will be in the next (10.0.9) release of MariaDB.

4 comments

  1. Kristian,

    It is indeed a nice idea that I had been discussing with a colleague pretty much as you seem to have implemented. I had been thinking about this for MySQL, but of course while the gtid implementation may be different the end result that is wanted is the same.
    The case I have seen is to do the data modification on the master, but then wait for a slave to catch and do further queries on a slave. This scales much better and keeps the load away from the master. Indeed having the MySQL protocol return a gtid as part of an INSERT, UPDATE, DELETE etc removes the extra round trip time and query to the master.

    You don’t say what valid timeout values are permissible but allowing 0 (no wait), and sub second waits in addition to a wait forever if not yet considered would also be very good. For replication chains involving more than 1 hop to a downstream slave from an upstream master this is very nice, so thanks.

    1. Yes, 0 means no wait, NULL means wait forever. Otherwise the timeout is in seconds, but fractional values are allowed eg. MASTER_GTID_WAIT(“0-1-100”, 0.010). The actual granularity is microseconds.

      1. So do you know of anyone using it successfully? Or is this just a theoretical idea with no practical value?

        What are the bits you lack from the server? Would the things I describe here be sufficient to have everything needed from a MariaDB server?

Leave a comment

Your email address will not be published. Required fields are marked *