Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16538

EC decoding failed due to not enough valid inputs

    XMLWordPrintableJSON

Details

    Description

      Currently, we found this error if the #StripeReader.readStripe() have more than one block read failed.

      We use the EC policy ec(6+3) in our cluster.

      Caused by: org.apache.hadoop.HadoopIllegalArgumentException: No enough valid inputs are provided, not recoverable
              at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkInputBuffers(ByteBufferDecodingState.java:119)
              at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:47)
              at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
              at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
              at org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:462)
              at org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
              at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:406)
              at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:327)
              at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:420)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:892)
              at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
              at java.base/java.io.DataInputStream.read(DataInputStream.java:149) 

       

      while (!futures.isEmpty()) {
        try {
          StripingChunkReadResult r = StripedBlockUtil
              .getNextCompletedStripedRead(service, futures, 0);
          dfsStripedInputStream.updateReadStats(r.getReadStats());
          DFSClient.LOG.debug("Read task returned: {}, for stripe {}",
              r, alignedStripe);
          StripingChunk returnedChunk = alignedStripe.chunks[r.index];
          Preconditions.checkNotNull(returnedChunk);
          Preconditions.checkState(returnedChunk.state == StripingChunk.PENDING);
      
          if (r.state == StripingChunkReadResult.SUCCESSFUL) {
            returnedChunk.state = StripingChunk.FETCHED;
            alignedStripe.fetchedChunksNum++;
            updateState4SuccessRead(r);
            if (alignedStripe.fetchedChunksNum == dataBlkNum) {
              clearFutures();
              break;
            }
          } else {
            returnedChunk.state = StripingChunk.MISSING;
            // close the corresponding reader
            dfsStripedInputStream.closeReader(readerInfos[r.index]);
      
            final int missing = alignedStripe.missingChunksNum;
            alignedStripe.missingChunksNum++;
            checkMissingBlocks();
      
            readDataForDecoding();
            readParityChunks(alignedStripe.missingChunksNum - missing);
          } 

      This error can be trigger by #StatefulStripeReader.decode.

      The reason is that:

      1. If there are more than one data block read failed, the #readDataForDecoding will be called multiple times;
      2. The decodeInputs array will be initialized repeatedly.
      3. The parity data in decodeInputs array which filled by #readParityChunks previously will be set to null.

       

       

       

      Attachments

        Issue Links

          Activity

            People

              qinyuren qinyuren
              qinyuren qinyuren
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m