Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
Description
During loads into HBase, we are noticing that a RS is sometimes getting stuck.
The logSyncer thread:
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(DFSClient.java:3367) - locked <0x00002aaac7fef748> (a java.util.LinkedList) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3301) at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:124) at org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:949)
A lot of other threads are stuck on:
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.addToSyncQueue(HLog.java:916) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:936) at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:828) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1657) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1425) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1393) at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1665) at org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionServer.java:2326)
Subsequently, trying to disable the table, which in turn attempts to close the region(s), caused internalFlushCache() also to get stuck here:
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:974)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:511)
- locked <0x00002aaab76af670> (a java.lang.Object)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:463)
at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:1468)
at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1329)
I'll attach the full jstack trace soon.
Attachments
Attachments
Issue Links
- is related to
-
HDFS-724 Pipeline close hangs if one of the datanode is not responsive.
- Closed