MySQL MHA高可用环境搭建

一、安装MHA基本环境
1. 安装MHA node
（1）基本环境说明，本文参考互联网文章学习，搭建MHA与测试如下。
参考文档：http://www.linuxidc.com/Linux/2016-05/130923.htm
角色 IP地址主机名
=============================================
Master 192.168.1.121 node1
Slave 192.168.1.122 node2
Slave 192.168.1.123 node3
Monitor host 192.168.1.125 node5
（2）在node1,node2,node3,node5操作：
# vi /etc/hosts
192.168.1.121 node1
192.168.1.122 node2
192.168.1.123 node3
192.168.1.125 node5
安装MHA node节点软件包：
# rpm -ivh http://dl.Fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
# yum install perl-DBD-MySQL perl-CPAN -y
# tar xf mha4mysql-node-0.56.tar.gz
# cd mha4mysql-node-0.56
# perl Makefile.PL
# make && make install

2. 安装MHA Manager
在node5管理节点上操作：注：MHA Manager主机也是需要安装MHA Node,MHA Manger
# yum install perl-DBD-MySQL perl-CPAN perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y
# tar xf mha4mysql-manager-0.56.tar.gz
# cd mha4mysql-manager-0.56
# perl Makefile.PL
# make && make install
#说明：安装的脚本程序都在/usr/local/bin/目录下。

3. 节点间配置SSH登录无密码验证（MHA主机之间使用key登录）
在node5（Monitor）：
# ssh-keygen -t rsa
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3
在node1（Master）：
# ssh-keygen -t rsa
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5
在node2 （slave）：
# ssh-keygen -t rsa
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5
在node3 （slave）：
# ssh-keygen -t rsa
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5
二、搭建主从复制环境
1. 主从复制环境配置过程
（1） mysql安装过程略,但是三节点要创建如下链接
node1（主），node2（主备从）,node3（从）
注意：创建如下链接：
ln -s /usr/local/mysql/bin/* /usr/local/bin/
node1 my.cnf
server-id = 1
binlog-format = ROW
log-bin = master-bin
log-bin-index = master-bin.index
log-slave-updates = true
relay_log_purge = 0
node2 my.cnf
server-id = 2
binlog-format = ROW
log-bin = master-bin
log-bin-index = master-bin.index
log-slave-updates = true
relay_log_purge = 0
node3 my.cnf
binlog-format = ROW
log-bin = mysql-bin
relay-log = slave-relay-bin
relay-log-index = slave-relay-bin.index
log-slave-updates = true
server-id = 11
skip-name-resolve
relay_log_purge = 0
（2）在node1 （Master）上备份一份完整的数据：
# mysqldump -uroot -p123456 --master-data=2 --single-transaction -R --triggers -A > all.sql
其中--master-data=2代表备份时刻记录master的Binlog位置和Position。
（3）在node1 （Master）上创建复制用户:
mysql> grant replication slave on *.* to "repl"@"192.168.1.%" identified by "123456";
mysql> flush privileges;
（4）查看主库备份时的binlog名称和位置，MASTER_LOG_FILE和MASTER_LOG_POS：
# head -n 30 all.sql | grep "CHANGE MASTER TO"
-- CHANGE MASTER TO MASTER_LOG_FILE="master-bin.000004", MASTER_LOG_POS=120;
（5）把备份复制到192.168.1.122和192.168.1.123
# scp all.sql 192.168.1.122:/root/
# scp all.sql 192.168.1.123:/root/
（6）分别在两台服务器上导入备份，执行复制相关命令
在node2、node3 主机上操作：
# mysql -uroot -p123456 < all.sql
mysql> stop slave;
CHANGE MASTER TO
MASTER_HOST="192.168.1.121",
MASTER_USER="repl",
MASTER_PASSWORD="123456",
MASTER_LOG_FILE="master-bin.000004",
MASTER_LOG_POS=120;
mysql> start slave;
mysql> show slave statusG
2. 创建MHA管理用户,在master上创建。
mysql> grant all privileges on *.* to "root"@"192.168.1.%" identified by "123456";
mysql> flush privileges;

三、配置Keepalived VIP
vip配置可以采用两种方式，一种通过keepalived的方式管理虚拟ip的浮动；一人是通过脚本方式，本文通过keepalived方式实现
1. 在node1（Master）与node2（备选主节点）安装keepalived。
# wget http://www.keepalived.org/software/keepalived-1.2.12.tar.gz
# tar xf keepalived-1.2.12.tar.gz
# cd keepalived-1.2.12
# ./configure --prefix=/usr/local/keepalived
# make && make install
# cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/
# cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
# mkdir /etc/keepalived
# cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
# cp /usr/local/keepalived/sbin/keepalived /usr/sbin/
2. 配置keepalived的配置文件，在node1（master）上配置操作如下:
注：keepalived配置成backup->backup，即IP地址切换后，主起来后IP地址不切换，本文监控脚本由MHA提供，keepalived不提供对mysqld的监控。
# vi /etc/keepalived/keepalived.conf
！ Configuration File for keepalived
global_defs {
notification_email {
abc@163.com
}
notification_email_from dba@dbserver.com
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id MySQL-HA
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 150
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.1.130
}
}
3. 配置keepalived的配置文件，在node2（备用节点）上配置操作如下:
# vi /etc/keepalived/keepalived.conf
！ Configuration File for keepalived
global_defs {
notification_email {
abc@163.com
}
notification_email_from dba@dbserver.com
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id MySQL-HA
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 120
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.1.130
}
}
4. node1,node2启动keepalived服务
# service keepalived start
# chkconfig keepalived on
5. node1查看VIP启动情况
[root@node1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:4e:53:71 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.121/24 brd 192.168.1.255 scope global eth0
inet 192.168.1.130/32 scope global eth0
inet6 fe80::20c:29ff:fe4e:5371/64 scope link
valid_lft forever preferred_lft forever
四、配置MHA
1. monitor创建MHA的工作目录，并且创建相关配置文件（在软件包解压后的目录里面有样例配置文件）。
# mkdir -p /etc/masterha
# mkdir -p /var/log/masterha/app1
# cp mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha/
修改app1.cnf配置文件，修改后的文件内容如下:
# cat /etc/masterha/app1.cnf
[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
master_binlog_dir=/usr/local/mysql/data/
master_ip_failover_script= /usr/local/bin/master_ip_failover
master_ip_online_change_script= /usr/local/bin/master_ip_online_change
password=123456
user=root
ping_interval=1
remote_workdir=/tmp
repl_password=123456
repl_user=repl
report_script=/usr/local/bin/send_report
ssh_user=root
[server1]
hostname=192.168.1.121
port=3306
[server2]
hostname=192.168.1.122
port=3306
candidate_master=1
check_repl_delay=0
[server3]
hostname=192.168.1.123
port=3306
说明:
master_ip_failover_script= /usr/local/bin/master_ip_failover #MHA自动切换执行的脚本，需要修改
master_ip_online_change_script= /usr/local/bin/master_ip_online_change #手动切换需要执行的脚本，需要修改
report_script=/usr/local/bin/send_report #切换时发送邮件进行报告，需要修改
2. 设置relay log的清除方式（在每个slave节点上）：
（1）在node2,node3 从节点上操作：
将relay_log_purge=0加入my.cnf配置文件，前面已经配置。
（2）设置定期清理relay脚本（node2,node3上操作）：
# cat purge_relay_log.sh
#！/bin/bash
user=root
passwd=123456
port=3306
log_dir="/data/masterha/log"
work_dir="/data"
purge="/usr/local/bin/purge_relay_logs"
if [ ！ -d $log_dir ]
then
mkdir $log_dir -p
fi
$purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
配置定时计划任务
# crontab -e
0 4 * * * /bin/bash /root/purge_relay_log.sh

3. 要想把keepalived服务引入MHA，我们只需要修改切换是触发的脚本文件master_ip_failover即可，在该脚本中添加在master发生宕机时对keepalived的处理。
（1）编辑脚本/usr/local/bin/master_ip_failover，修改后如下:
# vi /usr/local/bin/master_ip_failover
#！/usr/bin/env perl
use strict;
use warnings FATAL => "all";
use Getopt::Long;
my （
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
）;
my $vip = "192.168.1.130";
my $ssh_start_vip = "/etc/init.d/keepalived start";
my $ssh_stop_vip = "/etc/init.d/keepalived stop";
GetOptions（
"command=s" => $command,
"ssh_user=s" => $ssh_user,
"orig_master_host=s" => $orig_master_host,
"orig_master_ip=s" => $orig_master_ip,
"orig_master_port=i" => $orig_master_port,
"new_master_host=s" => $new_master_host,
"new_master_ip=s" => $new_master_ip,
"new_master_port=i" => $new_master_port,
）;
exit &main（）;
sub main {
print " IN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip=== ";
if （ $command eq "stop" || $command eq "stopssh" ） {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host ";
&stop_vip（）;
$exit_code = 0;
};
if （$@） {
warn "Got Error: $@ ";
exit $exit_code;
}
exit $exit_code;
}
elsif （ $command eq "start" ） {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host ";
&start_vip（）;
$exit_code = 0;
};
if （$@） {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif （ $command eq "status" ） {
print "Checking the Status of the script.. OK ";
exit 0;
}
else {
&usage（）;
exit 1;
}
}
sub start_vip（） {
`ssh $ssh_user@$new_master_host " $ssh_start_vip "`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip（） {
`ssh $ssh_user@$orig_master_host " $ssh_stop_vip "`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port ";
}
（2）编辑脚本master_ip_online_change，修改后如下:
#！/usr/bin/env perl
# Copyright （C） 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# （at your option） any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => "all";
use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw（ sleep gettimeofday tv_interval ）;
use Data::Dumper;
my $_tstart;
my $_running_interval = 0.1;
my （
$command, $orig_master_is_new_slave, $orig_master_host,
$orig_master_ip, $orig_master_port, $orig_master_user,
$orig_master_password, $orig_master_ssh_user, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password, $new_master_ssh_user
）;
my $vip = "192.168.1.130/24";
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $orig_master_ssh_port = 22;
my $new_master_ssh_port = 22;
GetOptions（
"command=s" => $command,
"orig_master_is_new_slave" => $orig_master_is_new_slave,
"orig_master_host=s" => $orig_master_host,
"orig_master_ip=s" => $orig_master_ip,
"orig_master_port=i" => $orig_master_port,
"orig_master_user=s" => $orig_master_user,
"orig_master_password=s" => $orig_master_password,
"orig_master_ssh_user=s" => $orig_master_ssh_user,
"new_master_host=s" => $new_master_host,
"new_master_ip=s" => $new_master_ip,
"new_master_port=i" => $new_master_port,
"new_master_user=s" => $new_master_user,
"new_master_password=s" => $new_master_password,
"new_master_ssh_user=s" => $new_master_ssh_user,
"orig_master_ssh_port=i" => $orig_master_ssh_port,
"new_master_ssh_port=i" => $new_master_ssh_port,
）;
exit &main（）;
sub current_time_us {
my （ $sec, $microsec ） = gettimeofday（）;
my $curdate = localtime（$sec）;
return $curdate . " " . sprintf（ "%06d", $microsec ）;
}
sub sleep_until {
my $elapsed = tv_interval（$_tstart）;
if （ $_running_interval > $elapsed ） {
sleep（ $_running_interval - $elapsed ）;
}
}
sub get_threads_util {
my $dbh = shift;
my $my_connection_id = shift;
my $running_time_threshold = shift;
my $type = shift;
$running_time_threshold = 0 unless （$running_time_threshold）;
$type = 0 unless （$type）;
my @threads;
my $sth = $dbh->prepare（"SHOW PROCESSLIST"）;
$sth->execute（）;
while （ my $ref = $sth->fetchrow_hashref（）） {
my $id = $ref->{Id};
my $user = $ref->{User};
my $host = $ref->{Host};
my $command = $ref->{Command};
my $state = $ref->{State};
my $query_time = $ref->{Time};
my $info = $ref->{Info};
$info =~ s/^s*（.*？）s*$/$1/ if defined（$info）;
next if （ $my_connection_id == $id ）;
next if （ defined（$query_time） && $query_time < $running_time_threshold ）;
next if （ defined（$command） && $command eq "Binlog Dump" ）;
next if （ defined（$user） && $user eq "system user" ）;
next
if （ defined（$command）
&& $command eq "Sleep"
&& defined（$query_time）
&& $query_time >= 1 ）;
if （ $type >= 1 ） {
next if （ defined（$command） && $command eq "Sleep" ）;
next if （ defined（$command） && $command eq "Connect" ）;
}
if （ $type >= 2 ） {
next if （ defined（$info） && $info =~ m/^select/i ）;
next if （ defined（$info） && $info =~ m/^show/i ）;
}
push @threads, $ref;
}
return @threads;
}
sub main {
if （ $command eq "stop" ） {
## Gracefully killing connections on the current master
# 1. Set read_only= 1 on the new master
# 2. DROP USER so that no app user can establish new connections
# 3. Set read_only= 1 on the current master
# 4. Kill current queries
# * Any database access failure will result in script die.
my $exit_code = 1;
eval {
## Setting read_only=1 on the new master （to avoid accident）
my $new_master_handler = new MHA::DBHelper（）;
# args: hostname, port, user, password, raise_error（die_on_error）_or_not
$new_master_handler->connect（ $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 ）;
print current_time_us（） . " Set read_only on the new master.. ";
$new_master_handler->enable_read_only（）;
if （ $new_master_handler->is_read_only（）） {
print "ok. ";
}
else {
die "Failed！ ";
}
$new_master_handler->disconnect（）;
# Connecting to the orig master, die if any database error happens
my $orig_master_handler = new MHA::DBHelper（）;
$orig_master_handler->connect（ $orig_master_ip, $orig_master_port,
$orig_master_user, $orig_master_password, 1 ）;
## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
$orig_master_handler->disable_log_bin_local（）;
print current_time_us（） . " Drpping app user on the orig master.. ";
#FIXME_xxx_drop_app_user（$orig_master_handler）;
## Waiting for N * 100 milliseconds so that current connections can exit
my $time_until_read_only = 15;
$_tstart = [gettimeofday];
my @threads = get_threads_util（ $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} ）;
while （ $time_until_read_only > 0 && $#threads >= 0 ） {
if （ $time_until_read_only % 5 == 0 ） {
printf
"%s Waiting all running %d threads are disconnected.. （max %d milliseconds） ",
current_time_us（）, $#threads + 1, $time_until_read_only * 100;
if （ $#threads < 5 ） {
print Data::Dumper->new（ [$_] ）->Indent（0）->Terse（1）->Dump . " "
foreach （@threads）;
}
}
sleep_until（）;
$_tstart = [gettimeofday];
$time_until_read_only--;
@threads = get_threads_util（ $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} ）;
}
## Setting read_only=1 on the current master so that nobody（except SUPER） can write
print current_time_us（） . " Set read_only=1 on the orig master.. ";
$orig_master_handler->enable_read_only（）;
if （ $orig_master_handler->is_read_only（）） {
print "ok. ";
}
else {
die "Failed！ ";
}
## Waiting for M * 100 milliseconds so that current update queries can complete
my $time_until_kill_threads = 5;
@threads = get_threads_util（ $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} ）;
while （ $time_until_kill_threads > 0 && $#threads >= 0 ） {
if （ $time_until_kill_threads % 5 == 0 ） {
printf
"%s Waiting all running %d queries are disconnected.. （max %d milliseconds） ",
current_time_us（）, $#threads + 1, $time_until_kill_threads * 100;
if （ $#threads < 5 ） {
print Data::Dumper->new（ [$_] ）->Indent（0）->Terse（1）->Dump . " "
foreach （@threads）;
}
}
sleep_until（）;
$_tstart = [gettimeofday];
$time_until_kill_threads--;
@threads = get_threads_util（ $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} ）;
}
## Terminating all threads
print current_time_us（） . " Killing all application threads.. ";
$orig_master_handler->kill_threads（@threads） if （ $#threads >= 0 ）;
print current_time_us（） . " done. ";
$orig_master_handler->enable_log_bin_local（）;
$orig_master_handler->disconnect（）;
## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
eval {
`ssh -p$orig_master_ssh_port $orig_master_ssh_user@$orig_master_host " $ssh_stop_vip "`;
};
if （$@） {
warn $@;
}
$exit_code = 0;
};
if （$@） {
warn "Got Error: $@ ";
exit $exit_code;
}
exit $exit_code;
}
elsif （ $command eq "start" ） {
## Activating master ip on the new master
# 1. Create app user with write privileges
# 2. Moving backup script if needed
# 3. Register new master"s ip to the catalog database
# We don"t return error even though activating updatable accounts/ip failed so that we don"t interrupt slaves" recovery.
# If exit code is 0 or 10, MHA does not abort
my $exit_code = 10;
eval {
my $new_master_handler = new MHA::DBHelper（）;
# args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect（ $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 ）;
## Set read_only=0 on the new master
$new_master_handler->disable_log_bin_local（）;
print current_time_us（） . " Set read_only=0 on the new master. ";
$new_master_handler->disable_read_only（）;
## Creating an app user on the new master
print current_time_us（） . " Creating app user on the new master.. ";
#FIXME_xxx_create_app_user（$new_master_handler）;
$new_master_handler->enable_log_bin_local（）;
$new_master_handler->disconnect（）;
## Update master ip on the catalog database, etc
`ssh -p$new_master_ssh_port $new_master_ssh_user@$new_master_host " $ssh_start_vip "`;
$exit_code = 0;
};
if （$@） {
warn "Got Error: $@ ";
exit $exit_code;
}
exit $exit_code;
}
elsif （ $command eq "status" ） {
# do nothing
exit 0;
}
else {
&usage（）;
exit 1;
}
}
sub usage {
print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port ";
die;
}
（3）编辑脚本send_report，修改后如下:
#！/usr/bin/perl
use strict;
use warnings FATAL => "all";
use Mail::Sender;
use Getopt::Long;
#new_master_host and new_slave_hosts are set only when recovering master succeeded
my （ $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body ）;
my $smtp="smtp.163.com";
my $mail_from="xxxx";
my $mail_user="xxxxx";
my $mail_pass="xxxxx";
my $mail_to=["xxxx","xxxx"];
GetOptions（
"orig_master_host=s" => $dead_master_host,
"new_master_host=s" => $new_master_host,
"new_slave_hosts=s" => $new_slave_hosts,
"subject=s" => $subject,
"body=s" => $body,
）;
mailToContacts（$smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body）;
sub mailToContacts {
my （ $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ） = @_;
open my $DEBUG, "> /tmp/monitormail.log"
or die "Can"t open the debug file:$！ ";
my $sender = new Mail::Sender {
ctype => "text/plain; charset=utf-8",
encoding => "utf-8",
smtp => $smtp,
from => $mail_from,
auth => "LOGIN",
TLS_allowed => "0",
authid => $user,
authpwd => $passwd,
to => $mail_to,
subject => $subject,
debug => $DEBUG
};
$sender->MailMsg（
{ msg => $msg,
debug => $DEBUG
}
） or print $Mail::Sender::Error;
return 1;
}
# Do whatever you want here
exit 0;

五、MHA的日常管理
1. 检查SSH配置（node5 Monitor 监控节点上操作），如下：
# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Sun May 1 22:05:12 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun May 1 22:05:12 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun May 1 22:05:12 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun May 1 22:05:12 2016 - [info] Starting SSH connection tests..
Sun May 1 22:05:14 2016 - [debug]
Sun May 1 22:05:12 2016 - [debug] Connecting via SSH from root@192.168.1.121（192.168.1.121:22） to root@192.168.1.122（192.168.1.122:22）..
Sun May 1 22:05:13 2016 - [debug] ok.
Sun May 1 22:05:13 2016 - [debug] Connecting via SSH from root@192.168.1.121（192.168.1.121:22） to root@192.168.1.123（192.168.1.123:22）..
Sun May 1 22:05:13 2016 - [debug] ok.
Sun May 1 22:05:14 2016 - [debug]
Sun May 1 22:05:13 2016 - [debug] Connecting via SSH from root@192.168.1.122（192.168.1.122:22） to root@192.168.1.121（192.168.1.121:22）..
Sun May 1 22:05:13 2016 - [debug] ok.
Sun May 1 22:05:13 2016 - [debug] Connecting via SSH from root@192.168.1.122（192.168.1.122:22） to root@192.168.1.123（192.168.1.123:22）..
Sun May 1 22:05:14 2016 - [debug] ok.
Sun May 1 22:05:14 2016 - [debug]
Sun May 1 22:05:13 2016 - [debug] Connecting via SSH from root@192.168.1.123（192.168.1.123:22） to root@192.168.1.121（192.168.1.121:22）..
Sun May 1 22:05:14 2016 - [debug] ok.
Sun May 1 22:05:14 2016 - [debug] Connecting via SSH from root@192.168.1.123（192.168.1.123:22） to root@192.168.1.122（192.168.1.122:22）..
Sun May 1 22:05:14 2016 - [debug] ok.
Sun May 1 22:05:14 2016 - [info] All SSH connection tests passed successfully.
2. 检查整个复制环境状况（node5 监控节点上操作），如下：
# masterha_check_repl --conf=/etc/masterha/app1.cnf
Sun May 1 22:46:44 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun May 1 22:46:44 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun May 1 22:46:44 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun May 1 22:46:44 2016 - [info] MHA::MasterMonitor version 0.56.
Sun May 1 22:46:45 2016 - [info] GTID failover mode = 0
Sun May 1 22:46:45 2016 - [info] Dead Servers:
Sun May 1 22:46:45 2016 - [info] Alive Servers:
Sun May 1 22:46:45 2016 - [info] 192.168.1.121（192.168.1.121:3306）
Sun May 1 22:46:45 2016 - [info] 192.168.1.122（192.168.1.122:3306）
Sun May 1 22:46:45 2016 - [info] 192.168.1.123（192.168.1.123:3306）
Sun May 1 22:46:45 2016 - [info] Alive Slaves:
Sun May 1 22:46:45 2016 - [info] 192.168.1.122（192.168.1.122:3306） Version=5.6.29-log （oldest major version between slaves） log-bin:enabled
Sun May 1 22:46:45 2016 - [info] Replicating from 192.168.1.121（192.168.1.121:3306）
Sun May 1 22:46:45 2016 - [info] Primary candidate for the new Master （candidate_master is set）
Sun May 1 22:46:45 2016 - [info] 192.168.1.123（192.168.1.123:3306） Version=5.6.29-log （oldest major version between slaves） log-bin:enabled
Sun May 1 22:46:45 2016 - [info] Replicating from 192.168.1.121（192.168.1.121:3306）
Sun May 1 22:46:45 2016 - [info] Current Alive Master: 192.168.1.121（192.168.1.121:3306）
Sun May 1 22:46:45 2016 - [info] Checking slave configurations..
Sun May 1 22:46:45 2016 - [info] read_only=1 is not set on slave 192.168.1.122（192.168.1.122:3306）.
Sun May 1 22:46:45 2016 - [warning] relay_log_purge=0 is not set on slave 192.168.1.122（192.168.1.122:3306）.
Sun May 1 22:46:45 2016 - [info] read_only=1 is not set on slave 192.168.1.123（192.168.1.123:3306）.
Sun May 1 22:46:45 2016 - [warning] relay_log_purge=0 is not set on slave 192.168.1.123（192.168.1.123:3306）.
Sun May 1 22:46:45 2016 - [info] Checking replication filtering settings..
Sun May 1 22:46:45 2016 - [info] binlog_do_db= , binlog_ignore_db=
Sun May 1 22:46:45 2016 - [info] Replication filtering check ok.
Sun May 1 22:46:45 2016 - [info] GTID （with auto-pos） is not supported
Sun May 1 22:46:45 2016 - [info] Starting SSH connection tests..
Sun May 1 22:46:46 2016 - [info] All SSH connection tests passed successfully.
Sun May 1 22:46:46 2016 - [info] Checking MHA Node version..
Sun May 1 22:46:47 2016 - [info] Version check ok.
Sun May 1 22:46:47 2016 - [info] Checking SSH publickey authentication settings on the current master..
Sun May 1 22:46:47 2016 - [info] HealthCheck: SSH to 192.168.1.121 is reachable.
Sun May 1 22:46:47 2016 - [info] Master MHA Node version is 0.56.
Sun May 1 22:46:47 2016 - [info] Checking recovery script configurations on 192.168.1.121（192.168.1.121:3306）..
Sun May 1 22:46:47 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data/ --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000008
Sun May 1 22:46:47 2016 - [info] Connecting to root@192.168.1.121（192.168.1.121:22）..
Creating /tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /usr/local/mysql/data/, up to master-bin.000008
Sun May 1 22:46:48 2016 - [info] Binlog setting check done.
Sun May 1 22:46:48 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun May 1 22:46:48 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user="root" --slave_host=192.168.1.122 --slave_ip=192.168.1.122 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info --relay_dir=/usr/local/mysql/data/ --slave_pass=xxx
Sun May 1 22:46:48 2016 - [info] Connecting to root@192.168.1.122（192.168.1.122:22）..
Checking slave recovery environment settings..
Opening /usr/local/mysql/data/relay-log.info ... ok.
Relay log found at /usr/local/mysql/data, up to node2-relay-bin.000002
Temporary relay log file is /usr/local/mysql/data/node2-relay-bin.000002
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file（s）.. done.
Sun May 1 22:46:48 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user="root" --slave_host=192.168.1.123 --slave_ip=192.168.1.123 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info --relay_dir=/usr/local/mysql/data/ --slave_pass=xxx
Sun May 1 22:46:48 2016 - [info] Connecting to root@192.168.1.123（192.168.1.123:22）..
Checking slave recovery environment settings..
Opening /usr/local/mysql/data/relay-log.info ... ok.
Relay log found at /usr/local/mysql/data, up to slave-relay-bin.000012
Temporary relay log file is /usr/local/mysql/data/slave-relay-bin.000012
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file（s）.. done.
Sun May 1 22:46:48 2016 - [info] Slaves settings check done.
Sun May 1 22:46:48 2016 - [info]
192.168.1.121（192.168.1.121:3306）（current master）
+--192.168.1.122（192.168.1.122:3306）
+--192.168.1.123（192.168.1.123:3306）
Sun May 1 22:46:48 2016 - [info] Checking replication health on 192.168.1.122..
Sun May 1 22:46:48 2016 - [info] ok.
Sun May 1 22:46:48 2016 - [info] Checking replication health on 192.168.1.123..
Sun May 1 22:46:48 2016 - [info] ok.
Sun May 1 22:46:48 2016 - [info] Checking master_ip_failover_script status:
Sun May 1 22:46:48 2016 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.121 --orig_master_ip=192.168.1.121 --orig_master_port=3306
IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===
Checking the Status of the script.. OK
Sun May 1 22:46:48 2016 - [info] OK.
Sun May 1 22:46:48 2016 - [warning] shutdown_script is not defined.
Sun May 1 22:46:48 2016 - [info] Got exit code 0 （Not master dead）.
MySQL Replication Health is OK.
[root@node5 masterha]#
3. 开启MHA Manager监控（node5操作）如下：
# mkdir -p /var/log/masterha/app1/
# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
参数说明：
--remove_dead_master_conf #该参数代表当发生主从切换后，老的主库的ip将会从配置文件中移除。
--manger_log #日志存放位置
--ignore_last_failover #在缺省情况下，如果MHA检测到连续发生宕机，会生成app1.failover.complete文件，会造成MHA管理进程无法启动。
4. 查看MHA Manager监控是否正常：
# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 （pid:2480） is running（0:PING_OK）, master:192.168.1.121
5. 查看启动日志（node5操作）如下：
# tail -n20 /var/log/masterha/app1/manager.log
6. 关闭MHA Manage监控：
（1）关闭
# masterha_stop --conf=/etc/masterha/app1.cnf
（2）启动
# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1
六、MHA Failover切换
1. 自动Failover切换
（1）模拟master mysql关闭
（2） VIP将会切换到node2
（3） /etc/masterha/app1.cnf中将原主服务器配置文件清掉。
（4） masterha_manager监控进程会自动退出关闭，并在/var/log/masterha/app1下生成app1.failover.complete文件，manager.log会记录全过程，从服务器会自动从新的主服务器复制。
（5）原主服务器mysqld启动的，需要清掉/var/log/masterha/app1下生成app1.failover.complete文件，添加node1配置文件到/etc/masterha/app1.cnf，通过manager.log中的记录的故障点，重新同步主服务器，成为从节点。
2. 手动Failover切换
（1）先停MHA Manager进程。
masterha_stop --conf=/etc/masterha/app1.cnf
（2）停掉master mysqld
（3）手动切换，在Manager主机上操作如下：
# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.1.122 --dead_master_port=3306 --new_master_host=192.168.1.121 --new_master_port=3306 --ignore_last_failover
通过观察日志可以观察切换全过程。
（4）如上节方式恢复节点为从服务器。
3. 正常运行情况下切换（Master正在运行）
等补充。
4. 小结
通过对MMM，MHA的环境搭建测试，MHA由于采用复制架构，原理简单，在一些对数据要求比较高的环境，为了保证可靠性，最好与半同步结合使用。本文永久更新链接地址