First steps with MariaDB Global Transaction ID

My previous writings were mostly theoretical, so I wanted to give a more practical example, showing the actual state of the current code. I also wanted to show how I have tried to make the feature fit well into the existing replication features, without requiring the user to enable lots of options or understand lots of restrictions before being able to use it.

So let us start! We will build the code from lp:~maria-captains/maria/10.0-mdev26, which at the time of writing is at revision knielsen@knielsen-hq.org-20130214134205-403yjqvzva6xk52j.

First, we start a master server on port 3310 and put a bit of data into it:

    server1> use test;
    server1> create table t1 (a int primary key, b int) engine=innodb;
    server1> insert into t1 values (1,1);
    server1> insert into t1 values (2,1);
    server1> insert into t1 values (3,1);

To provision a slave, we take a mysqldump:

    bash$ mysqldump --master-data=2 --single-transaction -uroot test > /tmp/dump.sql

Note that with --master-data=2 --single-transaction we obtain the exact binlog position corresponding to the data in the dump. Since MariaDB 5.3, this is completely non-blocking on the server (it does not do FLUSH TABLES WITH READ LOCK):

    bash$ grep "CHANGE MASTER" /tmp/dump.sql
    -- CHANGE MASTER TO MASTER_LOG_FILE='master-bin.000001', MASTER_LOG_POS=910;

Meanwhile, the master server has a couple more transactions:

    server1> insert into t1 values (4,2);
    server1> insert into t1 values (5,2);

Now let us start up the slave server on port 3311, load the dump, and start replicating from the master:

    bash$ mysql -uroot test < /tmp/dump.sql
    server2> change master to master_host='127.0.0.1', master_port=3310,
        master_user='root', master_log_file='master-bin.000001', master_log_pos=910;
    server2> start slave;
    server2> select * from t1;
    +---+------+
    | a | b    |
    +---+------+
    | 1 |    1 |
    | 2 |    1 |
    | 3 |    1 |
    | 4 |    2 |
    | 5 |    2 |
    +---+------+
    5 rows in set (0.00 sec)

So slave is up to date. In addition, when the slave connects to the master, it downloads the current GTID replication state, so everything is now ready for using global transaction ID. Let us promote the slave as the new master, and then later make the old master a slave of the new master. So stop the slave thread on the old slave, and run another transaction to simulate it being the new master:

    server2> stop slave;
    server2> insert into t1 values (6,3);

Finally, let us attach the old master as a slave using global transaction ID:

    server1> change master to master_host='127.0.0.1', master_port=3311,
        master_user='root', master_gtid_pos=auto;
    server1> start slave;
    server1> select * from t1;
    +---+------+
    | a | b    |
    +---+------+
    | 1 |    1 |
    | 2 |    1 |
    | 3 |    1 |
    | 4 |    2 |
    | 5 |    2 |
    | 6 |    3 |
    +---+------+
    6 rows in set (0.01 sec)

Old master is now running as slave and is up-to-date with the new master.

So that is it! A short post from me for once, but that is the whole point. Replication with MariaDB Global Transaction ID works much as it always did. The only new thing here is when we issue the CHANGE MASTER to make the old master a slave of the new master. We do not have to manually try to compute or guess the correct binlog position on the new master, we can just specify MASTER_GTID_POS=AUTO and the servers figure out the rest for themselves.

I hope I managed to show more concretely my ideas with MariaDB Global Transaction ID. Comments and questions are most welcome, as always. Everything above is actual commands that work on the current code on Launchpad. Everything else may or may not work yet, as this is work in progress, just so you know!

9 comments

(Anonymous) says:

February 15, 2013 at 04:20

Very Nice and simple.

Reply
Henrik Ingo says:

February 15, 2013 at 07:07

Another beautiful design from you. I like the MASTER_GTID_POS=AUTO! Congratulations.

Doing completely non-blocking mysqldump had somehow escaped me! I followed the link to knowledgebase. Do I understand correctly this is with the following assumptions:
– you must not run out of InnoDB redo-log space (or equivalent for other engines)
– you must not be doing DML against MyISAM or other engines that don’t support consistent snapshot

Reply
1. knielsen says:
  
  February 15, 2013 at 07:18
  
  > Another beautiful design from you. I like the MASTER_GTID_POS=AUTO!
  > Congratulations.
  
  Thanks! I’m glad like liked it.
  
  > – you must not run out of InnoDB redo-log space (or equivalent for other
  > engines)
  
  There is no such limitation. InnoDB never runs out of redo-log space, it just forces a checkpoint (are you thinking of Oracle RDBMS “snapshot too old” woes? InnoDB does not have that).
  
  > – you must not be doing DML against MyISAM or other engines that don’t
  > support consistent snapshot
  
  Right. Things will still work if you do. However, the obtained binlog position may not be consistent with the dumped data for those tables. Same with DDL.
  
  Reply
  1. Henrik Ingo says:
    
    February 15, 2013 at 07:40
    
    >> – you must not run out of InnoDB redo-log space (or equivalent for other
    >> engines)
    >
    > There is no such limitation. InnoDB never runs out of redo-log space, it just
    > forces a checkpoint (are you thinking of Oracle RDBMS “snapshot too old” woes?
    > InnoDB does not have that).
    
    I wasn’t thinking of anything in particular, just intuitively poking in some direction.
    
    I suppose what I’m really asking, is: Clearly it can’t be possible to keep a consistent snapshot open indefinitively? Wouldn’t InnoDB at some point either block new updates or alternatively have to release the old snapshot? Or will it happily keep around old MVCC copies of records indefinitively?
    
    Reply
    1. knielsen says:
      
      February 15, 2013 at 07:54
      
      > Clearly it can’t be possible to keep a consistent snapshot open
      > indefinitively?
      
      I believe InnoDB will try. Of course, the undo tablespace will grow if there are continously new updates happening. Basically, you will get purge lag.
      
      Mysqldump is a good option for smaller databases (where taking and restoring a dump can be done in reasonable time), due to its simplicity and flexibility. For bigger data sets, XtraDB is a better option. It also provides an option for obtaining non-blocking consistent binlog position, and does not cause purge lag.
      
      Reply
brychev says:

February 15, 2013 at 09:50

seems very nice and good=) waiting for stable release
one question. in my work i have to use chain replication topology.
imagine there are such chain: A->B->C->D
and once time B fall down (dead)
will i be able to connect D or C to A easy without computing and selecting right binlog position?
In this topology C and D know nothing about position where replication was stopped on B

Reply
1. knielsen says:
  
  February 15, 2013 at 10:25
  
  >imagine there are such chain: A->B->C->D
  
  > will i be able to connect D or C to A easy without computing and selecting
  > right binlog position?
  
  Yes. Just use CHANGE MASTER TO … MASTER_GTID_POS=AUTO on D or C, specifying the connection information for A.
  
  This will work in general for any server in any correctly setup replication topology. I spent a lot of effort in the design to make this work properly and automatically.
  
  Reply
maxmether says:

February 22, 2013 at 20:04

Hello Kristian,

I’ve been enjoying reading all your blog postings about this and I’m looking forward to testing the feature. I think one of the most useful aspects for this feature is in a multi-master approach with multiple data centers. Say that you have 4 data centers (A,B,C,D) and instead of replicating in a chain each server directly replicates each other server, ie you have 4 domains where each domain could actually be multiple servers in a local replication hierarchy (say X1,X2,X3,X4). So here the question is, what if one of the masters (say B1) would fail. Now B2 would be upgraded to become the master of the B data center and B3 and B4 would start replicating from B2. At the same time B2 would start replicating to and from the other masters, ie A1, C1 and D1. All of this should be pretty much automatic based on the GTID, right?

Now the real problem is, because of the asynch nature of replication, what if an event was in B1 but didn’t yet get to the other B servers? or if B3 had the latest event of the A domain but B2 had the latest event of the B domain etc. How do all the servers now make sure they are all synchronized for each domain? In particular as you only want 1 master per domain. Or can this not be a problem as all events on B1 were locally serialized?

So can an event be lost and could there be a synchronization problem?

Reply
1. knielsen says:
  
  February 25, 2013 at 13:23
  
  > aspects for this feature is in a multi-master approach with multiple data
  > centers. Say that you have 4 data centers (A,B,C,D) and instead of
  > replicating in a chain each server directly replicates each other server,
  > ie you have 4 domains where each domain could actually be multiple
  > servers in a local replication hierarchy (say X1,X2,X3,X4). So here the
  
  Are you going to have parallel updates in all four data centers A, B, C, and D? Then you need 4 domain_ids and to ensure that updates are not conflicting. Or will you ensure that only one datacenter at a time does updates? Then you only need a single domain.
  
  The primary issue here is that events can take multiple paths to the same destination, so somehow duplicate apply of events needs to be avoided. Ie. if an event originates at A, it can come to B three times: from a directly, and via B and C. This could perhaps be handled by comparing GTIDs and only applying the event if a more recent event has not yet been applied. Or perhaps by only sending from B to D events originating at B.
  
  > question is, what if one of the masters (say B1) would fail. Now B2 would
  > be upgraded to become the master of the B data center and B3 and B4 would
  > start replicating from B2. At the same time B2 would start replicating to
  > and from the other masters, ie A1, C1 and D1. All of this should be
  > pretty much automatic based on the GTID, right?
  
  B3 and B4, as well as A1, C1, and D1, would all be able to use MASTER_GTID_POS=AUTO to resume at the correct places in the binlog of B2, yes.
  
  > Now the real problem is, because of the asynch nature of replication,
  > what if an event was in B1 but didn’t yet get to the other B servers? or
  
  For example, the last event of B1 may have reached A but not B2. So you would need to bring B2 up to date before starting to use it as master. You do this by making it replicate briefly from A, C, and D until it is up-to-date with their current state. Then you can remove the read-only status of B2 and let B3, B4, A, C, and D connect to it as slaves.
  
  > if B3 had the latest event of the A domain but B2 had the latest event of
  > the B domain etc. How do all the servers now make sure they are all
  > synchronized for each domain? In particular as you only want 1 master per
  > domain. Or can this not be a problem as all events on B1 were locally
  > serialized?
  
  If you have 4 domains, each server remembers its last applied GTID within each domain. So B3 can start replicating each domain at different positions inside the binlog of B2. That is the main purpose of the domain_ids, to allow to keep track of multiple independent streams within a single binlog.
  
  > So can an event be lost and could there be a synchronization problem?
  
  I think it should be possible to set this up so that no events are lost and synchronization is done correctly. The main issue is to solve the multiple-path problem, which I have not considered before but sounds doable.
  
  Note that global transaction ID provides only the means for doing this by keeping track of positions. It does not include automated failover and synchronization, such must be scripted on top.
  
  Reply

9 comments

Leave a comment Cancel reply