Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11202

SequenceFile crashes with client-side encrypted files that are shorter than FileSystem.getStatus(path)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 2.2.0
    • None
    • fs/s3
    • None
    • Amazon EMR 3.0.4

    Description

      Encrypted files are often padded to allow for proper encryption on a 2^n-bit boundary. As a result, the encrypted file might be a few bytes bigger than the unencrypted file.

      We have a case where an encrypted files is 2 bytes bigger due to padding.

      When we run a HIVE job on the file to get a record count (select count from <table>) it runs org.apache.hadoop.mapred.SequenceFileRecordReader and loads the file in through a custom FS InputStream.
      The InputStream decrypts the file as it gets read in. Splits are properly handled as it extends both Seekable and Positioned Readable.

      When the org.apache.hadoop.io.SequenceFile class intializes it reads in the file size from the FileMetadata which returns the file size of the encrypted file on disk (or in this case in S3).
      However, the actual file size is 2 bytes less, so the InputStream will return EOF (-1) before the SequenceFile thinks it's done.
      As a result, the SequenceFile$Reader tried to run the next->readRecordLength after the file has been closed and we get a crash.

      The SequenceFile class SHOULD, instead, pay attention to the EOF marker from the stream instead of the file size reported in the metadata and set the 'more' flag accordingly.

      Sample stack dump from crash

      2014-10-10 21:25:27,160 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.io.IOException: java.io.EOFException
      at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
      at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
      at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
      at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
      at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
      at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:433)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
      at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
      at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
      Caused by: java.io.IOException: java.io.EOFException
      at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
      at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
      at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
      at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
      at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
      at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
      at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
      ... 11 more
      Caused by: java.io.EOFException
      at java.io.DataInputStream.readInt(DataInputStream.java:392)
      at org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:2332)
      at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2363)
      at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2500)
      at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
      at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
      ... 15 more
      Sample stack dump:

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              corby10 Corby Wilson
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: