Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1634

BlockCompressorStream.finish() is called twice in IFile.close leading to Shuffle errors

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.5.0, 0.6.0
    • 0.6.0, 0.5.2
    • None
    • None

    Description

      When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And as a part of FSDataOutputStream.close(), it again internally calls finish(). Please refer o.a.h.i.compress.BlockCompressorStream for more details on finish(). This leads to additional 4 bytes being written to IFile. This causes issues randomly during shuffle. Also, this prevents IFileInputStream to do the proper checksumming.

      This error happens only when we try to fetch multiple attempt outputs using the same URL. And is easily reproducible with SnappCompressionCodec. First attempt output would be downloaded by fetcher and due to the last 4 bytes in the stream, it wouldn't do the proper checksumming in IFileInputStream. This causes the subsequent attempt download to fail with the following exception.

      Example error in shuffle phase is attached below.

      >>>>
      2014-09-15 09:54:22,950 WARN fetcher [scope_41] #31 org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id
      java.lang.IllegalArgumentException: Invalid header received: partition: 0
      at org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352)
      at org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294)
      at org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
      >>>>

      I will attach the debug version of BlockCompressionStream with threaddump (which validates that finish() is called twice in IFile.close()). This bug was present in earlier versions of Tez as well, and was able to consistently reproduce it now on local-vm itself.

      Attachments

        1. BlockCompressorStream.with.logging.java
          6 kB
          Rajesh Balamohan
        2. stacktrace-with-comments.txt
          5 kB
          Rajesh Balamohan
        3. TEZ-1634.1.patch
          0.7 kB
          Rajesh Balamohan
        4. TEZ-1634.2.patch
          1.0 kB
          Gopal Vijayaraghavan

        Activity

          People

            rajesh.balamohan Rajesh Balamohan
            rajesh.balamohan Rajesh Balamohan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: