Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-3243

Importing BLOB data causes "Stream closed" error on encrypted HDFS

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.6
    • Fix Version/s: 1.5.0
    • Component/s: None
    • Labels:
      None

      Description

      Importing BLOB data into encrypted zone causes "Stream closed" error if

      • BLOB data size bigger than 16MB -> LobFile will be used
      • Java 8 is used -> has a different implementation of the close() method of FilterOutputStream class than Java 7

      Exception and stack trace:

      17/10/12 07:16:04 INFO mapreduce.Job: Running job: job_1507777811520_5091
      17/10/12 07:16:13 INFO mapreduce.Job: Job job_1507777811520_5091 running in uber mode : false
      17/10/12 07:16:13 INFO mapreduce.Job: map 0% reduce 0%
      17/10/12 07:22:37 INFO mapreduce.Job: Task Id : attempt_1507777811520_5091_m_000000_0, Status : FAILED
      Error: java.io.IOException: Stream closed
      at org.apache.hadoop.crypto.CryptoOutputStream.checkStream(CryptoOutputStream.java:268)
      at org.apache.hadoop.crypto.CryptoOutputStream.flush(CryptoOutputStream.java:255)
      at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
      at java.io.DataOutputStream.flush(DataOutputStream.java:123)
      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141)
      at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
      at org.apache.commons.io.output.ProxyOutputStream.close(ProxyOutputStream.java:117)
      at org.apache.sqoop.io.LobFile$V0Writer.close(LobFile.java:1669)
      at org.apache.sqoop.lib.LargeObjectLoader.close(LargeObjectLoader.java:96)
      at org.apache.sqoop.mapreduce.AvroImportMapper.cleanup(AvroImportMapper.java:79)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148)
      at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
      at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      

      The root cause of this issue is in LobFile.close method, which is being invoked from the Map cleanup. In line 1669, from the stacktrace, it's trying to close countingOut OS. However, at line 1664, out OS is already being closed. However, out OS is just a wrapper of countingOut OS, so at the end, both are pointing to same instance of CryptoOutputStream. When the call reaches line 1669, CryptoOutputStream instance is already closed by line 1664. The problem happens because java.io.BufferedOutputStream will try to call flush on the underlying OS it's wrapping (in this case, CryptoOutputStream), reaching line 255 of CryptoOutputStream.

        Attachments

        1. SQOOP-3243.patch
          3 kB
          Boglarka Egyed

          Issue Links

            Activity

              People

              • Assignee:
                BoglarkaEgyed Boglarka Egyed
                Reporter:
                BoglarkaEgyed Boglarka Egyed
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: