Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16544

EC decoding failed due to invalid buffer

    XMLWordPrintableJSON

Details

    Description

      In HDFS-16538 , we found an EC file decoding bug if more than one data block read failed. 

      Currently, we found another bug trigger by #StatefulStripeReader.decode.

      If we read an EC file which length more than one stripe, and this file have one data block and the first parity block corrupted, this error will happen.

      org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not allowing null    at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
          at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:48)
          at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
          at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
          at org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
          at org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
          at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
          at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
          at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
          at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) 

       

      Let's say we use ec(6+3) and the data block[0] and the first parity block[6] are corrupted.

      1. The readers for block[0] and block[6] will be closed after reading the first stripe of an EC file;
      2. When the client reading the second stripe of the EC file, it will trigger #prepareParityChunk for block[6]
      3. The decodeInputs[6] will not be constructed because the reader for block[6] was closed.

       

      boolean prepareParityChunk(int index) {
        Preconditions.checkState(index >= dataBlkNum
            && alignedStripe.chunks[index] == null);
        if (readerInfos[index] != null && readerInfos[index].shouldSkip) {
          alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING);
          // we have failed the block reader before
          return false;
        }
        final int parityIndex = index - dataBlkNum;
        ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate();
        buf.position(cellSize * parityIndex);
        buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock);
        decodeInputs[index] =
            new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock);
        alignedStripe.chunks[index] =
            new StripingChunk(decodeInputs[index].getBuffer());
        return true;
      } 

       

      Attachments

        Issue Links

          Activity

            People

              qinyuren qinyuren
              qinyuren qinyuren
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m