Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3816 Erasure Coding
  3. HDDS-6295

EC: Fix unaligned stripe write failure due to length overflow.



    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • EC-Branch
    • None


      We hit a bug when try to write a small key of length 321 (/etc/hosts in my box).

              at com.google.common.base.Preconditions.checkArgument(Preconditions.java:130)
              at org.apache.hadoop.ozone.client.io.ECKeyOutputStream.close(ECKeyOutputStream.java:543)
              at org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:61)
              at org.apache.hadoop.ozone.shell.keys.PutKeyHandler.execute(PutKeyHandler.java:107)
              at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:98)
              at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:44)
              at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
              at picocli.CommandLine.access$1300(CommandLine.java:145)
              at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
              at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
              at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
              at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
              at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
              at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
              at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96)
              at org.apache.hadoop.ozone.shell.OzoneShell.lambda$execute$17(OzoneShell.java:55)
              at org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:159)
              at org.apache.hadoop.ozone.shell.OzoneShell.execute(OzoneShell.java:53)
              at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87)
              at org.apache.hadoop.ozone.shell.OzoneShell.main(OzoneShell.java:47) 

      commands that we run:

      ./bin/ozone sh bucket create vol1/bucket1 --layout=FILE_SYSTEM_OPTIMIZED --replication=rs-10-4-1024k --type EC
      ./bin/ozone sh key put /vol1/bucket1/hosts /etc/hosts

      And we tested with more cases which will cause the same problem such as:

      dd if=/dev/zero of=dd.10.1M bs=1K count=10241
      ./bin/ozone sh key put /vol1/bucket1/dd.10.1M dd.10.1M

      And also some succeeded examples:

      • key of aligned size
      dd if=/dev/zero of=dd.10M bs=1K count=10240
      ./bin/ozone sh key put /vol1/bucket1/dd.10M ../dd.10M 
      • bucket with policy rs-3-2-1024k
      ./bin/ozone sh bucket create vol1/bucket2 --layout=FILE_SYSTEM_OPTIMIZED --replication=rs-3-2-1024k --type EC 
      ./bin/ozone sh key put /vol1/bucket1/dd.10M ../dd.10M 

      As I digged into the code, I found a potential int overflow in ECKeyoutputStream:

      private void handleOutputStreamWrite(int currIdx, long len,
          boolean isFullCell, boolean isParity) {
        BlockOutputStreamEntry current =
        int writeLengthToCurrStream =
            Math.min((int) len, (int) current.getRemaining());                      <-- int overflow happens
        currentBlockGroupLen += isParity ? 0 : writeLengthToCurrStream;
        if (isFullCell) {
          ByteBuffer bytesToWrite = isParity ?
              ecChunkBufferCache.getParityBuffers()[currIdx - numDataBlks] :
          try {
            // Since it's a fullcell, let's write all content from buffer.
            writeToOutputStream(current, len, bytesToWrite.array(),
                bytesToWrite.limit(), 0, isParity);
          } catch (Exception e) {

      It is because that BlockOutputStreamEntry#getRemaing() is the the remaining bytes in the current stream entry of the block group for EC, the newly defined length is the whole length of the "big" stripe across the block group, which may cause the potential int overflow above if we have a big number of data stripes such as EC(10:4).

      The passed in length = blockSize, defaults to 256M
      with EC policy 10-4-1024k, we got:
      this.length = replicationConfig.getData() * length
                  = 10 * 256M
                  = 2684354560  > INT_MAX(2147483647)
      with EC policy 3-2-1024k, we don't have this overflow:
      this.length = replicationConfig.getData() * length
                  = 3 * 256M
                  = 805306368   < INT_MAX(2147483647)

      Some codes for reference:

      // BlockOutputStreamEntry
      long getRemaining() {
        return getLength() - getCurrentPosition();
      // ECBlockOutputStreamEntry
        ECBlockOutputStreamEntry(BlockID blockID, String key,
            XceiverClientFactory xceiverClientManager, Pipeline pipeline, long length,
            BufferPool bufferPool, Token<OzoneBlockTokenIdentifier> token,
            OzoneClientConfig config) {
          super(blockID, key, xceiverClientManager, pipeline, length, bufferPool,
              token, config);
              pipeline.getReplicationConfig(), ECReplicationConfig.class);
          this.replicationConfig =
              (ECReplicationConfig) pipeline.getReplicationConfig();
          this.length = replicationConfig.getData() * length;                         <-- a new length of the block group defined for EC
        @Override public long getLength() { return length; }


        Issue Links



              markgui Mark Gui
              markgui Mark Gui
              0 Vote for this issue
              2 Start watching this issue