Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9665

BlockDecompressorStream#decompress will throw EOFException instead of return -1 when EOF

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.1.2, 2.1.0-beta, 2.3.0
    • 1-win, 2.1.0-beta, 1.2.1
    • None
    • None

    Description

      BlockDecompressorStream#decompress ultimately calls rawReadInt, which will throw EOFException instead of return -1 when encountering end of a stream. Then, decompress will be called by read. However, InputStream#read is supposed to return -1 instead of throwing EOFException to indicate the end of a stream. This explains why in LineReader,

            if (bufferPosn >= bufferLength) {
              startPosn = bufferPosn = 0;
              if (prevCharCR)
                ++bytesConsumed; //account for CR from previous read
              bufferLength = in.read(buffer);
              if (bufferLength <= 0)
                break; // EOF
            }
      

      -1 is checked instead of catching EOFException.

      Now the problem will occur with SnappyCodec. If an input file is compressed with SnappyCodec, it needs to be decompressed through BlockDecompressorStream when it is read. Then, if it empty, EOFException will been thrown from rawReadInt and break LineReader.

      Attachments

        1. HADOOP-9665-branch-1.1.patch
          7 kB
          Zhijie Shen
        2. HADOOP-9665.2.patch
          3 kB
          Zhijie Shen
        3. HADOOP-9665.1.patch
          0.9 kB
          Zhijie Shen

        Issue Links

          Activity

            People

              zjshen Zhijie Shen
              zjshen Zhijie Shen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: