Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3816 Erasure Coding
  3. HDDS-5822

EC: Writing a large buffer to an EC file duplicates first chunk in block 1 and 2

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • EC-Branch
    • None

    Description

      If you write a large buffer of data containing several chunks of data like this:

      byte[] inputData = new byte[dataLength];
          RAND.nextBytes(inputData);
          for (byte b : inputData) {
            key.write(b);
          }
      

      Then the current EC Key logic will write the first chunk twice to block 1 and block 2 and then probably (I have not verified) drop the last chunk completely.

      This is due to a bug in ECKeyOutputStream.write(…):

       int currentChunkBufferRemainingLength =
              ecChunkBufferCache.dataBuffers[blockOutputStreamEntryPool.getCurrIdx()]
                  .remaining();
          int currentChunkBufferLen =
              ecChunkBufferCache.dataBuffers[blockOutputStreamEntryPool.getCurrIdx()]
                  .position();
          int maxLenToCurrChunkBuffer = (int) Math.min(len, ecChunkSize);
          int currentWriterChunkLenToWrite =
              Math.min(currentChunkBufferRemainingLength, maxLenToCurrChunkBuffer);
          int pos = handleDataWrite(blockOutputStreamEntryPool.getCurrIdx(), b, off,
              currentWriterChunkLenToWrite,
              currentChunkBufferLen + currentWriterChunkLenToWrite == ecChunkSize);
          checkAndWriteParityCells(pos);
      
          int remLen = len - currentWriterChunkLenToWrite;
          int iters = remLen / ecChunkSize;
          int lastCellSize = remLen % ecChunkSize;
          while (iters > 0) {
            pos = handleDataWrite(blockOutputStreamEntryPool.getCurrIdx(), b, off,
                ecChunkSize, true);
            off += ecChunkSize;
            iters--;
            checkAndWriteParityCells(pos);
          }
      

      Here we write the first chunk before entering the "iters" loop, but we forget to increment "off" which results in the same data getting written twice.

      We need to add "currentWriterChunkLenToWrite" to "off" before entering the loop.

      We should add a test to reproduce this issue and then add the fix.

      Attachments

        Issue Links

          Activity

            People

              umamaheswararao Uma Maheswara Rao G
              sodonnell Stephen O'Donnell
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: