Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
3.2.0
-
None
-
Reviewed
Description
In the operation of writing EC files, when the client calls addBlock() applying for the second block group (or >= the second block group) and it happens to exceed quota at this time, the client program will hang forever.
See the demo below:
$ hadoop fs -mkdir -p /user/wanghongbing/quota/ec $ hdfs dfsadmin -setSpaceQuota 2g /user/wanghongbing/quota $ hdfs ec -setPolicy -path /user/wanghongbing/quota/ec -policy RS-6-3-1024k Set RS-6-3-1024k erasure coding policy on /user/wanghongbing/quota/ec $ hadoop fs -put 800m /user/wanghongbing/quota/ec ^@^@^@^@^@^@^@^@^Z
In the case of blocksize=128M, spaceQuota=2g and EC 6-3 policy, a block group needs to apply for 1152M physical space to write 768M logical data. Therefore, writing 800M data will exceed quota when applying for the second block group. At this point, the client will be hang forever.
The exception stack of client is as follows:
java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000008009d5d8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) at org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.takeWithTimeout(DFSStripedOutputStream.java:117) at org.apache.hadoop.hdfs.DFSStripedOutputStream.waitEndBlocks(DFSStripedOutputStream.java:453) at org.apache.hadoop.hdfs.DFSStripedOutputStream.allocateNewBlock(DFSStripedOutputStream.java:477) at org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:541) - locked <0x000000008009f758> (a org.apache.hadoop.hdfs.DFSStripedOutputStream) at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164) - locked <0x000000008009f758> (a org.apache.hadoop.hdfs.DFSStripedOutputStream) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145) - locked <0x000000008009f758> (a org.apache.hadoop.hdfs.DFSStripedOutputStream) at org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1182) - locked <0x000000008009f758> (a org.apache.hadoop.hdfs.DFSStripedOutputStream) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847) - locked <0x000000008009f758> (a org.apache.hadoop.hdfs.DFSStripedOutputStream) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101) at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280) at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:77) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129) at org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485) at org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:407) at org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342) at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277) at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
When an exception occurs in addBlock, the program will call DFSStripedOutputStream.closeImpl() -> flushBuffer() -> writeChunk() -> allocateNewBlock() -> waitEndBlocks(), waitEndBlocks will enter an infinite loop because the queue in endBlocks is empty.
private void waitEndBlocks(int i) throws IOException { while (getStripedDataStreamer(i).isHealthy()) { final ExtendedBlock b = coordinator.endBlocks.takeWithTimeout(i); if (b != null) { StripedBlockUtil.checkBlocks(currentBlockGroup, i, b); return; } } }
So I close all stripedDataStreamer to fix it When an exception occurs in addBlock.
Attachments
Attachments
Issue Links
- requires
-
HDFS-7663 Erasure Coding: Append on striped file
- Resolved