Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
In HDFS-16538 , we found an EC file decoding bug if more than one data block read failed.
Currently, we found another bug trigger by #StatefulStripeReader.decode.
If we read an EC file which length more than one stripe, and this file have one data block and the first parity block corrupted, this error will happen.
org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not allowing null at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:48)
at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
at org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
at org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918)
Let's say we use ec(6+3) and the data block[0] and the first parity block[6] are corrupted.
- The readers for block[0] and block[6] will be closed after reading the first stripe of an EC file;
- When the client reading the second stripe of the EC file, it will trigger #prepareParityChunk for block[6].
- The decodeInputs[6] will not be constructed because the reader for block[6] was closed.
boolean prepareParityChunk(int index) { Preconditions.checkState(index >= dataBlkNum && alignedStripe.chunks[index] == null); if (readerInfos[index] != null && readerInfos[index].shouldSkip) { alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING); // we have failed the block reader before return false; } final int parityIndex = index - dataBlkNum; ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate(); buf.position(cellSize * parityIndex); buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock); decodeInputs[index] = new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock); alignedStripe.chunks[index] = new StripingChunk(decodeInputs[index].getBuffer()); return true; }
Attachments
Issue Links
- links to