Hive
  1. Hive
  2. HIVE-3423

merge_dynamic_partition.q is failing when running hive on real cluster

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None

      Description

      merge_dynamic_partition (and a number of other qfiles) is failing when running the current hive on a real cluster:

      java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/apache/commons/compress/compressors/gzip/GzipCompressorOutputStream
      at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
      at org.apache.hadoop.mapred.Child.main(Child.java:262)
      Caused by: java.lang.NoClassDefFoundError: org/apache/commons/compress/compressors/gzip/GzipCompressorOutputStream
      at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:644)
      at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:613)
      at

        Issue Links

          Activity

          Hide
          Zhenxiao Luo added a comment -

          This is a dependency problem,

          in common/src/java/org/apache/hadoop/hive/common/FileUtils.java:

          FileUtils.java, tar() is using GzipCompressorOutputStream, and import this class in the header.

          When FileSinkOperator.java is calling FileUtils.makePartName(dpColNames, row) in its getDynPartDirectory() function, this NoClassDefFoundError is triggered.

          This is because, both tar() and makePartName() are static functions in FileUtils.java.

          when one static member of a class is referenced, the JVM has to initialize all static members, including static methods, and part of initializing static methods involves making sure that all of the dependencies are available.

          so, when makePartName() is called from FileSinkOperator.getDynPartDirectory(), tar() is initialized, and JVM tries to resolve the GzipCompressorOutputStream dependency.

          GzipCompressorOutputStream is from commons-compress*.jar. This dependency is added to common/ivy.xml, but not included in hive-exec.jar.

          When running on one machine, it is passed, sine commons-compress*.jar is resolved by ivy, and it is in build/ivy/lib/default.

          But when running in a real cluster, commons-compress*.jar is not included in all mapper/reducers classpath. So, this NoClassDefFoundError would be triggered.

          My proposed solution is to add commons-compress*.jar into hive-exec.jar.

          Show
          Zhenxiao Luo added a comment - This is a dependency problem, in common/src/java/org/apache/hadoop/hive/common/FileUtils.java: FileUtils.java, tar() is using GzipCompressorOutputStream, and import this class in the header. When FileSinkOperator.java is calling FileUtils.makePartName(dpColNames, row) in its getDynPartDirectory() function, this NoClassDefFoundError is triggered. This is because, both tar() and makePartName() are static functions in FileUtils.java. when one static member of a class is referenced, the JVM has to initialize all static members, including static methods, and part of initializing static methods involves making sure that all of the dependencies are available. so, when makePartName() is called from FileSinkOperator.getDynPartDirectory(), tar() is initialized, and JVM tries to resolve the GzipCompressorOutputStream dependency. GzipCompressorOutputStream is from commons-compress*.jar. This dependency is added to common/ivy.xml, but not included in hive-exec.jar. When running on one machine, it is passed, sine commons-compress*.jar is resolved by ivy, and it is in build/ivy/lib/default. But when running in a real cluster, commons-compress*.jar is not included in all mapper/reducers classpath. So, this NoClassDefFoundError would be triggered. My proposed solution is to add commons-compress*.jar into hive-exec.jar.
          Hide
          Zhenxiao Luo added a comment -

          Review Request submitted at:
          https://reviews.facebook.net/D5103

          Show
          Zhenxiao Luo added a comment - Review Request submitted at: https://reviews.facebook.net/D5103
          Hide
          Edward Capriolo added a comment -

          This approach is fine. Do you think we can simple redo FileUtils so it is not a final class and we remove the static methods? Or is it that case that we have to include commons-compress in hive-exec regardless?

          Show
          Edward Capriolo added a comment - This approach is fine. Do you think we can simple redo FileUtils so it is not a final class and we remove the static methods? Or is it that case that we have to include commons-compress in hive-exec regardless?
          Hide
          Zhenxiao Luo added a comment -

          @Edward: I just found that HIVE-3295 already fixes the bug by a redo of FileUtils. This is definitely better than adding commons-compress into hive-exec. Thanks a lot for your guide. How about I take HIVE-3295's patch, and mark this as duplicate?

          Show
          Zhenxiao Luo added a comment - @Edward: I just found that HIVE-3295 already fixes the bug by a redo of FileUtils. This is definitely better than adding commons-compress into hive-exec. Thanks a lot for your guide. How about I take HIVE-3295 's patch, and mark this as duplicate?
          Hide
          Ashutosh Chauhan added a comment -

          This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          Show
          Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

            People

            • Assignee:
              Zhenxiao Luo
              Reporter:
              Zhenxiao Luo
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development