Mysql主从同步缓慢的诡异问题请教

显示全部楼层 · 2014-6-12 16:34:36

两台生产linux主机做的主从，一直运行的很好，昨天突然运行出现巨大延迟，Exec_Master_Log_Pos 跳动一会后停止，过一会又动，就这样往复。查看各个参数都显示正常，之前没有调整，但是现在 show processlist里会出现 Reading event from the relay log ，这个状态出现一会后消失，过一会又出现。磁盘IO 在调整了 sync_binlog =1 后有所提高,原来默认是 0 ，在 io 写很低的时候 Exec_Master_Log_Pos 就可能不变化，请大家帮忙看一下问题在哪里，多谢了。
innodb status 里会出现如下信息：
MySQL thread id 10, query id 279129 Reading event from the relay log
或者：
7 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 5
详细信息如下：
------------
TRANSACTIONS
------------
Trx id counter 42B92CD79
Purge done for trx's n

show slave status\G;
*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 10.0.1.10

Master_User: slaver

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: mysql-bin.006524

Read_Master_Log_Pos: 46913321

Relay_Log_File: rep_relay_log.003406

Relay_Log_Pos: 653699241
Relay_Master_Log_File: mysql-bin.006509

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 653699095

Relay_Log_Space: 16153580669

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:
Seconds_Behind_Master: 110932
Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:
Replicate_Ignore_Server_Ids:

Master_Server_Id: 603333
1 row in set (0.00 sec)
ERROR:
No query specified

千问 · 2014-6-12 16:34:36

自己顶一下。。。各位大侠帮忙看看啊

千问 · 2014-6-12 16:34:36

没人遇到过这样的问题？

千问 · 2014-6-12 16:34:36

既然都记录下了有延迟的那个时候的slave status, 那就去主库上看看那个file的position执行的是什么语句嘛,
Relay_Master_Log_File: mysql-bin.006509
Exec_Master_Log_Pos: 653699095

千问 · 2014-6-12 16:34:36

wangwenan6 发表于 2016-7-12 16:40
既然都记录下了有延迟的那个时候的slave status, 那就去主库上看看那个file的position执行的是什么语句嘛,
...
语句都是正常的表操作 update ，没有做大数据的批量操作。

千问 · 2014-6-12 16:34:36

查看了binlog文件，没有发现大的事物操作，都是正常作业的 update 和insert 。可以看到有个线程（thread/innodb/srv_master_thread ）一直是 100% ，系统也一直显示 “Reading event from the relay log” ，这个过程中调整过的参数：query_cache_type = 0和query_cache_size = 0、sync_binlog=1都没有好转，会走但是就是非常缓慢。这个实例还有个奇怪的现象当打开 query_cache_type = 1 时，就会报“ invalidating query cache entries (table)”，消除了这个后就长显示 “Reading event from the relay log”

千问 · 2014-6-12 16:34:36

好吧, 好久没上论坛, 回复晚了, 抱歉;
Reading event from the relay log应该是没什么问题的, 只是说明IO_Thread在读取relay_log的内容;
现象描述里面, 不懂什么意思:
Exec_Master_Log_Pos 跳动一会后停止-->突然很高,然后又变成0? 还是存在延迟, 但是这个数字偶尔偶尔不动, 偶尔动一下? or 其他?
而且磁盘IO的话, 看vmstat, 也没看到IO等待很高, 吞吐量也没什么异常的样子, 这个问题的现象和IO有什么其他的联系?

千问 · 2014-6-12 16:34:36

如果Exec_Master_Log_Pos的值一直在动，但是比较缓慢，排除机器的配置问题外，就是主库在跑没索引的SQL

千问 · 2014-6-12 16:34:36

你是不是主库里某个大表没有主键索引。如果您实在找不到原因，可以请专家连上去看下，应该能较快定位到。

千问 · 2014-6-12 16:34:36

是不是有些语句没有释放锁造成的