Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16538

EC decoding failed due to not enough valid inputs

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Currently, we found this error if the #StripeReader.readStripe() have more than one block read failed.

      We use the EC policy ec(6+3) in our cluster.

      Caused by: org.apache.hadoop.HadoopIllegalArgumentException: No enough valid inputs are provided, not recoverable
              at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkInputBuffers(ByteBufferDecodingState.java:119)
              at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:47)
              at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
              at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
              at org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:462)
              at org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
              at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:406)
              at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:327)
              at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:420)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:892)
              at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
              at java.base/java.io.DataInputStream.read(DataInputStream.java:149) 

       

      while (!futures.isEmpty()) {
        try {
          StripingChunkReadResult r = StripedBlockUtil
              .getNextCompletedStripedRead(service, futures, 0);
          dfsStripedInputStream.updateReadStats(r.getReadStats());
          DFSClient.LOG.debug("Read task returned: {}, for stripe {}",
              r, alignedStripe);
          StripingChunk returnedChunk = alignedStripe.chunks[r.index];
          Preconditions.checkNotNull(returnedChunk);
          Preconditions.checkState(returnedChunk.state == StripingChunk.PENDING);
      
          if (r.state == StripingChunkReadResult.SUCCESSFUL) {
            returnedChunk.state = StripingChunk.FETCHED;
            alignedStripe.fetchedChunksNum++;
            updateState4SuccessRead(r);
            if (alignedStripe.fetchedChunksNum == dataBlkNum) {
              clearFutures();
              break;
            }
          } else {
            returnedChunk.state = StripingChunk.MISSING;
            // close the corresponding reader
            dfsStripedInputStream.closeReader(readerInfos[r.index]);
      
            final int missing = alignedStripe.missingChunksNum;
            alignedStripe.missingChunksNum++;
            checkMissingBlocks();
      
            readDataForDecoding();
            readParityChunks(alignedStripe.missingChunksNum - missing);
          } 

      This error can be trigger by #StatefulStripeReader.decode.

      The reason is that:

      1. If there are more than one data block read failed, the #readDataForDecoding will be called multiple times;
      2. The decodeInputs array will be initialized repeatedly.
      3. The parity data in decodeInputs array which filled by #readParityChunks previously will be set to null.

       

       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            qinyuren qinyuren Assign to me
            qinyuren qinyuren
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 1h 20m
              1h 20m

              Slack

                Issue deployment