此时,Server A的服务器宕机,需要将业务切换到Server B上。同时,我们又需要将Server C的复制源改成Server B。复制源修改的命令语法很简单即CHANGE MASTER TO MASTER_HOST="xxx", MASTER_LOG_FILE="xxx", MASTER_LOG_POS=nnnn。而难点在于,由于同一个事务在每台机器上所在的binlog名字和位置都不一样,那么怎么找到Server C当前同步停止点,对应Server B的master_log_file和master_log_pos是什么的时候就成为了难题。这也就是为什么M-S复制集群需要使用MMM,MHA这样的额外管理工具的一个重要原因。
这个问题在5.6的GTID出现后,就显得非常的简单。由于同一事务的GTID在所有节点上的值一致,那么根据Server C当前停止点的GTID就能唯一定位到Server B上的GTID。甚至由于MASTER_AUTO_POSITION功能的出现,我们都不需要知道GTID的具体值,直接使用CHANGE MASTER TO MASTER_HOST="xxx", MASTER_AUTO_POSITION命令就可以直接完成failover的工作。 So easy不是么?
基于GTID的主从复制简介
搭建
搭建使用了mysql_sandbox脚本为基础,先创建了一个一主三从的基于位置复制的环境。然后通过配置修改,将整个架构专为基于GTID的复制。
根据MySQL官方文档给出的GTID搭建建议。需要一次对主从节点做配置修改,并重启服务。这样的操作,显然在production环境进行升级时是不可接受的。Facebook,Booking.com,Percona都对此通过patch做了优化,做到了更优雅的升级。具体的操作方式会在以后的博文当中介绍到。这里我们就按照官方文档,进行一次实验性的升级。
主要的升级步骤会有以下几步:
确保主从同步在master上配置read_only,保证没有新数据写入修改master上的my.cnf,并重启服务修改slave上的my.cnf,并重启服务在slave上执行change master to并带上master_auto_position=1启用基于GTID的复制由于是实验环境,read_only和服务重启并无大碍。只要按照官方的GTID搭建建议做就能顺利完成升级,这里就不赘述详细过程了。下面列举了一些在升级过程中容易遇到的错误。
常见错误
gtid_mode=ON,log_slave_updates,enforce_gtid_consistency这三个参数一定要同时在my.cnf中配置。否则在mysql.err中会出现如下的报错
2016-10-08 20:11:08 32147 [ERROR] --gtid-mode=ON or UPGRADE_STEP_1 or UPGRADE_STEP_2 requires --log-bin and --log-slave-updates
2016-10-08 20:13:53 32570 [ERROR] --gtid-mode=ON or UPGRADE_STEP_1 requires --enforce-gtid-consistency
change master to 后的warnings
在按照文档的操作change master to后,会发现有两个warnings。其实是两个安全性警告,不影响正常的同步(有兴趣的读者可以看下关于该warning的具体介绍。warning的具体内容如下:
slave1 [localhost] {msandbox} ((none)) > stop slave;Query OK, 0 rows affected (0.03 sec)slave1 [localhost] {msandbox} ((none)) > change master to master_host="127.0.0.1",master_port =21288,master_user="rsandbox",master_password="rsandbox",master_auto_position=1;Query OK, 0 rows affected, 2 warnings (0.04 sec)slave1 [localhost] {msandbox} ((none)) > show warnings;+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| Level | Code | Message |+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| Note | 1759 | Sending passwords in plain text without SSL/TLS is extremely insecure. || Note | 1760 | Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the "START SLAVE Syntax" in the MySQL Manual for more information. |+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+2 rows in set (0.00 sec)实验一:如果slave所需要事务对应的GTID在master上已经被purge了
master [localhost] {msandbox} (test) > show global variables like "%gtid%";+---------------------------------+----------------------------------------+| Variable_name | Value |+---------------------------------+----------------------------------------+| binlog_gtid_simple_recovery | OFF || enforce_gtid_consistency | ON || gtid_executed | 24024e52-bd95-11e4-9c6d-926853670d0b:1 || gtid_mode | ON || gtid_owned | || gtid_purged | || simplified_binlog_gtid_recovery | OFF |+---------------------------------+----------------------------------------+7 rows in set (0.01 sec)master [localhost] {msandbox} (test) > flush logs;create table gtid_test2 (ID int) engine=innodb;Query OK, 0 rows affected (0.04 sec)Query OK, 0 rows affected (0.02 sec)master [localhost] {msandbox} (test) > flush logs;create table gtid_test3 (ID int) engine=innodb;Query OK, 0 rows affected (0.04 sec)Query OK, 0 rows affected (0.04 sec)master [localhost] {msandbox} (test) > show master status;+------------------+----------+--------------+------------------+------------------------------------------+| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |+------------------+----------+--------------+------------------+------------------------------------------+| mysql-bin.000005 | 359 | | | 24024e52-bd95-11e4-9c6d-926853670d0b:1-3 |+------------------+----------+--------------+------------------+------------------------------------------+1 row in set (0.00 sec)master [localhost] {msandbox} (test) > purge binary logs to "mysql-bin.000004";Query OK, 0 rows affected (0.03 sec)master [localhost] {msandbox} (test) > show global variables like "%gtid%";+---------------------------------+------------------------------------------+| Variable_name | Value |+---------------------------------+------------------------------------------+| binlog_gtid_simple_recovery | OFF || enforce_gtid_consistency | ON || gtid_executed | 24024e52-bd95-11e4-9c6d-926853670d0b:1-3 || gtid_mode | ON || gtid_owned | || gtid_purged | 24024e52-bd95-11e4-9c6d-926853670d0b:1 || simplified_binlog_gtid_recovery | OFF |+---------------------------------+------------------------------------------+7 rows in set (0.00 sec)在slave2上重新做一次主从,以下命令在slave2上执行
slave2 [localhost] {msandbox} ((none)) > change master to master_host="127.0.0.1",master_port =21288,master_user="rsandbox",master_password="rsandbox",master_auto_position=1;Query OK, 0 rows affected, 2 warnings (0.04 sec)slave2 [localhost] {msandbox} ((none)) > start slave;Query OK, 0 rows affected (0.01 sec)slave2 [localhost] {msandbox} ((none)) > show slave statusG*************************** 1. row ***************************......Slave_IO_Running: NoSlave_SQL_Running: Yes......Last_Errno: 0Last_Error:Skip_Counter: 0Exec_Master_Log_Pos: 0Relay_Log_Space: 151......Last_IO_Errno: 1236Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: "The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires."Last_SQL_Errno: 0Last_SQL_Error:......Auto_Position: 11 row in set (0.00 sec)实验二:忽略purged的部分,强行同步
master [localhost] {msandbox} (test) > show global variables like "%gtid%";+---------------------------------+------------------------------------------+| Variable_name | Value |+---------------------------------+------------------------------------------+| binlog_gtid_simple_recovery | OFF || enforce_gtid_consistency | ON || gtid_executed | 24024e52-bd95-11e4-9c6d-926853670d0b:1-3 || gtid_mode | ON || gtid_owned | || gtid_purged | 24024e52-bd95-11e4-9c6d-926853670d0b:1 || simplified_binlog_gtid_recovery | OFF |+---------------------------------+------------------------------------------+7 rows in set (0.00 sec)在slave上通过set global gtid_purged="xxxx"的方式,跳过已经purge的部分
slave2 [localhost] {msandbox} ((none)) > stop slave;Query OK, 0 rows affected (0.04 sec)slave2 [localhost] {msandbox} ((none)) > set global gtid_purged = "24024e52-bd95-11e4-9c6d-926853670d0b:1";Query OK, 0 rows affected (0.05 sec)slave2 [localhost] {msandbox} ((none)) > start slave;Query OK, 0 rows affected (0.01 sec)slave2 [localhost] {msandbox} ((none)) > show slave statusG *************************** 1. row ***************************Slave_IO_State: Waiting for master to send event......Master_Log_File: mysql-bin.000005Read_Master_Log_Pos: 359Relay_Log_File: mysql_sandbox21290-relay-bin.000004Relay_Log_Pos: 569Relay_Master_Log_File: mysql-bin.000005Slave_IO_Running: YesSlave_SQL_Running: Yes......Exec_Master_Log_Pos: 359Relay_Log_Space: 873......Master_Server_Id: 1Master_UUID: 24024e52-bd95-11e4-9c6d-926853670d0bMaster_Info_File: /data/mysql/rsandbox_mysql-5_6_23/node2/data/master.infoSQL_Delay: 0SQL_Remaining_Delay: NULLSlave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it......Retrieved_Gtid_Set: 24024e52-bd95-11e4-9c6d-926853670d0b:2-3Executed_Gtid_Set: 24024e52-bd95-11e4-9c6d-926853670d0b:1-3Auto_Position: 11 row in set (0.00 sec)可以看到此时slave已经可以正常同步,并补齐了24024e52-bd95-11e4-9c6d-926853670d0b:2-3范围的binlog日志。