Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.7.0
-
None
Description
We encountered a case where we are seeing some rows (via Phoenix) in replicated cluster but they are not present in source/active cluster.
Triaging further we found memstore rollback logs in few of the region servers.
2021-07-28 14:17:59,353 DEBUG [3,queue=3,port=60020] regionserver.HRegion - rollbackMemstore rolled back 23 2021-07-28 14:17:59,353 DEBUG [,queue=25,port=60020] regionserver.HRegion - rollbackMemstore rolled back 23 2021-07-28 14:17:59,354 DEBUG [3,queue=3,port=60020] regionserver.HRegion - rollbackMemstore rolled back 23 2021-07-28 14:17:59,354 DEBUG [,queue=25,port=60020] regionserver.HRegion - rollbackMemstore rolled back 23 2021-07-28 14:17:59,354 DEBUG [3,queue=3,port=60020] regionserver.HRegion - rollbackMemstore rolled back 23 2021-07-28 14:17:59,354 DEBUG [,queue=25,port=60020] regionserver.HRegion - rollbackMemstore rolled back 23 2021-07-28 14:17:59,355 DEBUG [3,queue=3,port=60020] regionserver.HRegion - rollbackMemstore rolled back 23 2021-07-28 14:17:59,355 DEBUG [,queue=25,port=60020] regionserver.HRegion - rollbackMemstore rolled back 23 2021-07-28 14:17:59,356 DEBUG [,queue=25,port=60020] regionserver.HRegion - rollbackMemstore rolled back 23
Looking more into logs, found that there were some hdfs layer issues sync'ing wal to hdfs.
It was taking around 6 mins to sync wal. Logs below
2021-07-28 14:19:30,511 WARN [sync.0] hdfs.DataStreamer - Slow waitForAckedSeqno took 391210ms (threshold=30000ms). File being written: /hbase/WALs/<rs-node>,60020,1626191371499/<rs-node>%2C60020%2C1626191371499.1627480615620, block: BP-958889176-<namenode>-1567030695029:blk_1689647875_616028364, Write pipeline datanodes: [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK], DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK], DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]. 2021-07-28 14:19:30,589 WARN [sync.1] hdfs.DataStreamer - Slow waitForAckedSeqno took 391148ms (threshold=30000ms). File being written: /hbase/WALs/<rs-node>,60020,1626191371499/<rs-node>%2C60020%2C1626191371499.1627480615620, block: BP-958889176-<namenode>-1567030695029:blk_1689647875_616028364, Write pipeline datanodes: [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK], DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK], DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]. 2021-07-28 14:19:30,589 WARN [sync.2] hdfs.DataStreamer - Slow waitForAckedSeqno took 391147ms (threshold=30000ms). File being written: /hbase/WALs/<rs-node>,60020,1626191371499/<rs-node>%2C60020%2C1626191371499.1627480615620, block: BP-958889176-<namenode>-1567030695029:blk_1689647875_616028364, Write pipeline datanodes: [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK], DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK], DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]. 2021-07-28 14:19:30,591 INFO [sync.0] wal.FSHLog - Slow sync cost: 391289 ms, current pipeline: [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK], DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK], DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]] 2021-07-28 14:19:30,591 INFO [sync.1] wal.FSHLog - Slow sync cost: 391227 ms, current pipeline: [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK], DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK], DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]] 2021-07-28 14:19:30,591 WARN [sync.1] wal.FSHLog - Requesting log roll because we exceeded slow sync threshold; time=391227 ms, threshold=10000 ms, current pipeline: [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK], DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK], DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]] 2021-07-28 14:19:30,591 INFO [sync.2] wal.FSHLog - Slow sync cost: 391227 ms, current pipeline: [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK], DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK], DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]] 2021-07-28 14:19:30,591 WARN [sync.2] wal.FSHLog - Requesting log roll because we exceeded slow sync threshold; time=391227 ms, threshold=10000 ms, current pipeline: [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK], DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK], DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]] 2021-07-28 14:19:30,591 WARN [sync.3] hdfs.DataStreamer - Slow waitForAckedSeqno took 391217ms (threshold=30000ms). File being written: /hbase/WALs/<rs-node>,60020,1626191371499/<rs-node>%2C60020%2C1626191371499.1627480615620, block: BP-958889176-<namenode>-1567030695029:blk_1689647875_616028364, Write pipeline datanodes: [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK], DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK], DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]. 2021-07-28 14:19:30,591 INFO [sync.3] wal.FSHLog - Slow sync cost: 391217 ms, current pipeline: [DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK], DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK], DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]
Just to add more details, we also found there was a namenode failover. Might be a contributing factor on why sync's were slow but not sure.
2021-07-28 14:19:30,390 WARN [<rs-host>:60020-3] retry.RetryInvocationHandler - A failover has occurred since the start of call #8491727 Client NamenodeProtocolTranslatorPB.mkdirs over <namenode-hostname>/<namenode-ip>:8020 2021-07-28 14:19:30,390 WARN [<rs-host>:60020-7] retry.RetryInvocationHandler - A failover has occurred since the start of call #8491726 ClientNamenodeProtocolTranslatorPB.mkdirs over <namenode-hostname>/<namenode-ip>:8020 2021-07-28 14:19:30,391 WARN [20%2C1626191371499,1] retry.RetryInvocationHandler - A failover has occurred since the start of call #8491723 ClientNamenodeProtocolTranslatorPB.getBlockLocations over <namenode-hostname>/<namenode-ip>:8020 2021-07-28 14:19:30,391 WARN [0.40:60020.logRoller] retry.RetryInvocationHandler - A failover has occurred since the start of call #8491729 ClientNamenodeProtocolTranslatorPB.getFileInfo over <namenode-hostname>/<namenode-ip>:8020 2021-07-28 14:19:30,391 WARN [MemStoreFlusher.1] retry.RetryInvocationHandler - A failover has occurred since the start of call #8491730 ClientNamenodeProtocolTranslatorPB.getFileInfo over <namenode-hostname>/<namenode-ip>:8020 2021-07-28 14:19:30,391 WARN [371499.1627480615620] retry.RetryInvocationHandler - A failover has occurred since the start of call #8491731 ClientNamenodeProtocolTranslatorPB.addBlock over <namenode-hostname>/<namenode-ip>:8020 2021-07-28 14:19:30,391 WARN [c@prod] retry.RetryInvocationHandler - A failover has occurred since the start of call #8491725 ClientNamenodeProtocolTranslatorPB.renewLease over <namenode-hostname>/<namenode-ip>:8020 2021-07-28 14:19:30,391 WARN [91371499@<Replicatonsource-thread>] retry.RetryInvocationHandler - A failover has occurred since the start of call #8491724 ClientNamenodeProtocolTranslatorPB.getBlockLocations over <namenode-hostname>/<namenode-ip>:8020
We use the default timeout for hbase.regionserver.wal.sync.timeout which is 5 mins.
Since the sync took more than 6 minutes (which is more than hbase.regionserver.wal.sync.timeout), the client received an Exception and we rolled back tje mutations in memstore on the primary cluster.
But the hdfs sync succeeded eventually, the row was present in wal and was eligible for replication.
bharathv andrew.purtell@gmail.com gjacoby helped brainstorming for a possible solution. Will update the possible approaches in the comments section.
Attachments
Issue Links
- links to