Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-10682

EC Reconstruction creates empty chunks at the end of blocks with partial stripes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0, 1.4.1
    • None

    Description

      Given an EC block that is larger than 1 full stripe, but the last stripe is partial so that it does not use all the index.

      If one of the replicas is reconstructed that does not have any data in that final position, an empty chunk is written to the end of the block's chunk list.

      While this does no cause any immediate problem, it can prevent further reconstructions that attempt to use this block, and they will fail with an error like:

      2024-04-09 01:06:21,855 ERROR [ec-reconstruct-reader-TID-4]-org.apache.hadoop.hdds.scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: 7f6f1fc9-ed26-4e19-86b6-47435b027f6a, Nodes: 7f6f1fc9-ed26-4e19-86b6-47435b027f6a(ccycloud-4.quasar-jyswng.root.comops.site/10.140.150.0), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2024-04-09T01:06:21.724509Z[UTC]].
      2024-04-09 01:06:21,859 INFO [ContainerReplicationThread-1]-org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream: ECBlockReconstructedStripeInputStream{conID: 10007 locID: 113750153625610009}@756a3998: error reading [1], marked as failed
      org.apache.hadoop.ozone.client.io.BadDataLocationException: java.io.IOException: Failed to get chunkInfo[77]: len == 0
              at org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:644)
              at org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.lambda$loadDataBuffersFromStream$2(ECBlockReconstructedStripeInputStream.java:577)
              at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
              at java.base/java.lang.Thread.run(Thread.java:834)
      Caused by: java.io.IOException: Failed to get chunkInfo[77]: len == 0
              at org.apache.hadoop.hdds.scm.storage.BlockInputStream.validate(BlockInputStream.java:278)
              at org.apache.hadoop.hdds.scm.storage.BlockInputStream.lambda$static$0(BlockInputStream.java:265)
              at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:407)
              at org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:347)
              at org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
              at org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
              at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:342)
              at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:323)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:208)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$getBlock$0(ContainerProtocolCalls.java:186)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:146)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:185)
              at org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:255)
              at org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:146)
              at org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:308)
              at org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:66)
              at org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readFromCurrentLocation(ECBlockReconstructedStripeInputStream.java:655)
              at org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:631)
              ... 5 more
      

      If there are other spare replicas which can be used, reconstruction will continue, otherwise it will not be able to complete.

      At this stage, I am not sure if this can affect reading a block via the normal read path.

      Attachments

        Issue Links

          Activity

            People

              sodonnell Stephen O'Donnell
              sodonnell Stephen O'Donnell
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: