Hadoop Common
  1. Hadoop Common
  2. HADOOP-9991

Fix up Hadoop Poms for enforced dependencies, roll up JARs to latest versions

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.1.1-beta, 2.3.0
    • Fix Version/s: None
    • Component/s: build
    • Labels:
      None
    • Target Version/s:

      Description

      If you try using Hadoop downstream with a classpath shared with HBase and Accumulo, you soon discover how messy the dependencies are.

      Hadoop's side of this problem is

      1. not being up to date with some of the external releases of common JARs
      2. not locking down/excluding inconsistent versions of artifacts provided down the dependency graph

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Steve Loughran created issue -
          Hide
          Steve Loughran added a comment -

          Proposed Fixes

          1. turn on maven enforcement in the build so highlight inconsistencies coming in from below (e.g Avro's SLF4J dependency)
          2. fix those inconsistencies by excluding the conflicts coming from dependencies
          3. Add explicit imports and scope limits on all dependencies, with version numbers we manage
          4. Tightening the downstream exported dependencies, so hadoop-client only declares dependencies on the JARs it really needs (not, say JUnit).
          5. enum later versions of JARs that we can easily migrate to simply by incrementing JARs: and doing that in trunk, with the enforcer identifying more dependency problems to address.
          6. identifying low-cost updates (ideally those with patches in, like the JetS3t/S3 patch), and selectively applying them -again, fixing problems.

          I'd push this all at trunk, though items 1-4 could be backported to 2.x once complete

          Show
          Steve Loughran added a comment - Proposed Fixes turn on maven enforcement in the build so highlight inconsistencies coming in from below (e.g Avro's SLF4J dependency) fix those inconsistencies by excluding the conflicts coming from dependencies Add explicit imports and scope limits on all dependencies, with version numbers we manage Tightening the downstream exported dependencies, so hadoop-client only declares dependencies on the JARs it really needs (not, say JUnit). enum later versions of JARs that we can easily migrate to simply by incrementing JARs: and doing that in trunk, with the enforcer identifying more dependency problems to address. identifying low-cost updates (ideally those with patches in, like the JetS3t/S3 patch), and selectively applying them -again, fixing problems. I'd push this all at trunk, though items 1-4 could be backported to 2.x once complete
          Hide
          Steve Loughran added a comment -

          enforcement is on, it's more excessive export to things downstream, especially with hbase in the mix -as that is where version number problems start to surface

          This is the main set of "exclusions because they don't appear used" values, though HDFS's JSP pages may well need jasper. The jersey-test-framework-grizzly2 dependency (branch-2.1.1) is clearly spurious

              <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-minicluster</artifactId>
                <version>${hadoop.version}</version>
                <scope>test</scope>
                <exclusions>
                  <exclusion>
                    <groupId>com.sun.jersey.jersey-test-framework</groupId>
                    <artifactId>jersey-test-framework-grizzly2</artifactId>
                  </exclusion>
                </exclusions>
              </dependency>
          
              <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-hdfs</artifactId>
                <version>${hadoop.version}</version>
                <exclusions>
                  <exclusion>
                    <groupId>tomcat</groupId>
                    <artifactId>jasper-runtime</artifactId>
                  </exclusion>
                </exclusions>
              </dependency>
              <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-yarn-server-common</artifactId>
                <version>${hadoop.version}</version>
                <exclusions>
                  <exclusion>
                    <groupId>com.sun.jersey.jersey-test-framework</groupId>
                    <artifactId>jersey-test-framework-grizzly2</artifactId>
                  </exclusion>
                </exclusions>
              </dependency>
          
              <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-yarn-client</artifactId>
                <version>${hadoop.version}</version>
                <exclusions>
                  <exclusion>
                    <groupId>com.sun.jersey.jersey-test-framework</groupId>
                    <artifactId>jersey-test-framework-grizzly2</artifactId>
                  </exclusion>
                </exclusions>      
              </dependency>
          
          Show
          Steve Loughran added a comment - enforcement is on, it's more excessive export to things downstream, especially with hbase in the mix -as that is where version number problems start to surface This is the main set of "exclusions because they don't appear used" values, though HDFS's JSP pages may well need jasper. The jersey-test-framework-grizzly2 dependency (branch-2.1.1) is clearly spurious <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-minicluster</artifactId> <version>${hadoop.version}</version> <scope>test</scope> <exclusions> <exclusion> <groupId>com.sun.jersey.jersey-test-framework</groupId> <artifactId>jersey-test-framework-grizzly2</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>${hadoop.version}</version> <exclusions> <exclusion> <groupId>tomcat</groupId> <artifactId>jasper-runtime</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-yarn-server-common</artifactId> <version>${hadoop.version}</version> <exclusions> <exclusion> <groupId>com.sun.jersey.jersey-test-framework</groupId> <artifactId>jersey-test-framework-grizzly2</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-yarn-client</artifactId> <version>${hadoop.version}</version> <exclusions> <exclusion> <groupId>com.sun.jersey.jersey-test-framework</groupId> <artifactId>jersey-test-framework-grizzly2</artifactId> </exclusion> </exclusions> </dependency>
          Hide
          Steve Loughran added a comment -

          MapReduce is marking junit as compile scoped, not test, so trickles into the tarballs and downstream

          Show
          Steve Loughran added a comment - MapReduce is marking junit as compile scoped, not test, so trickles into the tarballs and downstream
          Hide
          Steve Loughran added a comment -

          downstream of the hadoop-client artifact, this is what you get on your dependency graph

          [INFO] \- org.apache.hadoop:hadoop-client:pom:2.1.2-SNAPSHOT:compile
          [INFO]    +- org.apache.hadoop:hadoop-common:jar:2.1.2-SNAPSHOT:compile
          [INFO]    |  +- commons-cli:commons-cli:jar:1.2:compile
          [INFO]    |  +- org.apache.commons:commons-math:jar:2.1:compile
          [INFO]    |  +- xmlenc:xmlenc:jar:0.52:compile
          [INFO]    |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
          [INFO]    |  +- commons-io:commons-io:jar:2.1:compile
          [INFO]    |  +- commons-logging:commons-logging:jar:1.1.1:compile
          [INFO]    |  +- log4j:log4j:jar:1.2.17:compile
          [INFO]    |  +- commons-lang:commons-lang:jar:2.5:compile
          [INFO]    |  +- commons-configuration:commons-configuration:jar:1.6:compile
          [INFO]    |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
          [INFO]    |  |  +- commons-digester:commons-digester:jar:1.8:compile
          [INFO]    |  |  |  \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
          [INFO]    |  |  \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
          [INFO]    |  +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile
          [INFO]    |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.7:compile
          [INFO]    |  +- org.apache.avro:avro:jar:1.7.4:compile
          [INFO]    |  |  +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
          [INFO]    |  |  \- org.xerial.snappy:snappy-java:jar:1.0.4.1:compile
          [INFO]    |  +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
          [INFO]    |  +- org.apache.hadoop:hadoop-auth:jar:2.1.2-SNAPSHOT:compile
          [INFO]    |  +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile
          [INFO]    |  \- org.apache.commons:commons-compress:jar:1.4.1:compile
          [INFO]    |     \- org.tukaani:xz:jar:1.0:compile
          [INFO]    +- org.apache.hadoop:hadoop-hdfs:jar:2.1.2-SNAPSHOT:compile
          [INFO]    |  +- com.google.guava:guava:jar:11.0.2:compile
          [INFO]    |  |  \- com.google.code.findbugs:jsr305:jar:1.3.9:compile
          [INFO]    |  +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
          [INFO]    |  +- commons-codec:commons-codec:jar:1.4:compile
          [INFO]    |  \- org.codehaus.jackson:jackson-core-asl:jar:1.9.7:compile
          [INFO]    +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.1.2-SNAPSHOT:compile
          [INFO]    |  +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.1.2-SNAPSHOT:compile
          [INFO]    |  |  +- org.apache.hadoop:hadoop-yarn-client:jar:2.1.2-SNAPSHOT:compile
          [INFO]    |  |  |  +- com.google.inject:guice:jar:3.0:compile
          [INFO]    |  |  |  |  +- javax.inject:javax.inject:jar:1:compile
          [INFO]    |  |  |  |  \- aopalliance:aopalliance:jar:1.0:compile
          [INFO]    |  |  |  +- com.sun.jersey:jersey-server:jar:1.9:compile
          [INFO]    |  |  |  |  +- asm:asm:jar:3.1:compile
          [INFO]    |  |  |  |  \- com.sun.jersey:jersey-core:jar:1.9:compile
          [INFO]    |  |  |  +- com.sun.jersey:jersey-json:jar:1.9:compile
          [INFO]    |  |  |  |  +- org.codehaus.jettison:jettison:jar:1.1:compile
          [INFO]    |  |  |  |  |  \- stax:stax-api:jar:1.0.1:compile
          [INFO]    |  |  |  |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
          [INFO]    |  |  |  |  |  \- javax.xml.bind:jaxb-api:jar:2.2.2:compile
          [INFO]    |  |  |  |  |     \- javax.activation:activation:jar:1.1:compile
          [INFO]    |  |  |  |  +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.3:compile
          [INFO]    |  |  |  |  \- org.codehaus.jackson:jackson-xc:jar:1.8.3:compile
          [INFO]    |  |  |  \- com.sun.jersey.contribs:jersey-guice:jar:1.9:compile
          [INFO]    |  |  \- org.apache.hadoop:hadoop-yarn-server-common:jar:2.1.2-SNAPSHOT:compile
          [INFO]    |  +- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.1.2-SNAPSHOT:compile
          [INFO]    |  \- org.slf4j:slf4j-api:jar:1.7.5:compile
          [INFO]    +- org.apache.hadoop:hadoop-yarn-api:jar:2.1.2-SNAPSHOT:compile
          [INFO]    +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.1.2-SNAPSHOT:compile
          [INFO]    |  \- org.apache.hadoop:hadoop-yarn-common:jar:2.1.2-SNAPSHOT:compile
          [INFO]    +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.1.2-SNAPSHOT:compile
          [INFO]    \- org.apache.hadoop:hadoop-annotations:jar:2.1.2-SNAPSHOT:compile
          
          
          Show
          Steve Loughran added a comment - downstream of the hadoop-client artifact, this is what you get on your dependency graph [INFO] \- org.apache.hadoop:hadoop-client:pom:2.1.2-SNAPSHOT:compile [INFO] +- org.apache.hadoop:hadoop-common:jar:2.1.2-SNAPSHOT:compile [INFO] | +- commons-cli:commons-cli:jar:1.2:compile [INFO] | +- org.apache.commons:commons-math:jar:2.1:compile [INFO] | +- xmlenc:xmlenc:jar:0.52:compile [INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | +- commons-io:commons-io:jar:2.1:compile [INFO] | +- commons-logging:commons-logging:jar:1.1.1:compile [INFO] | +- log4j:log4j:jar:1.2.17:compile [INFO] | +- commons-lang:commons-lang:jar:2.5:compile [INFO] | +- commons-configuration:commons-configuration:jar:1.6:compile [INFO] | | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | | \- commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile [INFO] | +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile [INFO] | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.7:compile [INFO] | +- org.apache.avro:avro:jar:1.7.4:compile [INFO] | | +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile [INFO] | | \- org.xerial.snappy:snappy-java:jar:1.0.4.1:compile [INFO] | +- com.google.protobuf:protobuf-java:jar:2.5.0:compile [INFO] | +- org.apache.hadoop:hadoop-auth:jar:2.1.2-SNAPSHOT:compile [INFO] | +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile [INFO] | \- org.apache.commons:commons-compress:jar:1.4.1:compile [INFO] | \- org.tukaani:xz:jar:1.0:compile [INFO] +- org.apache.hadoop:hadoop-hdfs:jar:2.1.2-SNAPSHOT:compile [INFO] | +- com.google.guava:guava:jar:11.0.2:compile [INFO] | | \- com.google.code.findbugs:jsr305:jar:1.3.9:compile [INFO] | +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile [INFO] | +- commons-codec:commons-codec:jar:1.4:compile [INFO] | \- org.codehaus.jackson:jackson-core-asl:jar:1.9.7:compile [INFO] +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.1.2-SNAPSHOT:compile [INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.1.2-SNAPSHOT:compile [INFO] | | +- org.apache.hadoop:hadoop-yarn-client:jar:2.1.2-SNAPSHOT:compile [INFO] | | | +- com.google.inject:guice:jar:3.0:compile [INFO] | | | | +- javax.inject:javax.inject:jar:1:compile [INFO] | | | | \- aopalliance:aopalliance:jar:1.0:compile [INFO] | | | +- com.sun.jersey:jersey-server:jar:1.9:compile [INFO] | | | | +- asm:asm:jar:3.1:compile [INFO] | | | | \- com.sun.jersey:jersey-core:jar:1.9:compile [INFO] | | | +- com.sun.jersey:jersey-json:jar:1.9:compile [INFO] | | | | +- org.codehaus.jettison:jettison:jar:1.1:compile [INFO] | | | | | \- stax:stax-api:jar:1.0.1:compile [INFO] | | | | +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile [INFO] | | | | | \- javax.xml.bind:jaxb-api:jar:2.2.2:compile [INFO] | | | | | \- javax.activation:activation:jar:1.1:compile [INFO] | | | | +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.3:compile [INFO] | | | | \- org.codehaus.jackson:jackson-xc:jar:1.8.3:compile [INFO] | | | \- com.sun.jersey.contribs:jersey-guice:jar:1.9:compile [INFO] | | \- org.apache.hadoop:hadoop-yarn-server-common:jar:2.1.2-SNAPSHOT:compile [INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.1.2-SNAPSHOT:compile [INFO] | \- org.slf4j:slf4j-api:jar:1.7.5:compile [INFO] +- org.apache.hadoop:hadoop-yarn-api:jar:2.1.2-SNAPSHOT:compile [INFO] +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.1.2-SNAPSHOT:compile [INFO] | \- org.apache.hadoop:hadoop-yarn-common:jar:2.1.2-SNAPSHOT:compile [INFO] +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.1.2-SNAPSHOT:compile [INFO] \- org.apache.hadoop:hadoop-annotations:jar:2.1.2-SNAPSHOT:compile
          Hide
          Steve Loughran added a comment -
          1. hadoop-hdfs should be excluding com.google.code.findbugs:jsr305:jar:1.3.9 from guava (or downgrade to provided)
          2. hadoop-mapreduce-client-common should not need to depend on yarn-server-common, or jersey-server, though jax
          3. hadoop-common should mark ZK dependency as <provided>, with HDFS server declaring it for HA.
          Show
          Steve Loughran added a comment - hadoop-hdfs should be excluding com.google.code.findbugs:jsr305:jar:1.3.9 from guava (or downgrade to provided) hadoop-mapreduce-client-common should not need to depend on yarn-server-common, or jersey-server, though jax hadoop-common should mark ZK dependency as <provided>, with HDFS server declaring it for HA.
          Steve Loughran made changes -
          Field Original Value New Value
          Link This issue is related to HADOOP-10067 [ HADOOP-10067 ]
          Ted Yu made changes -
          Attachment hadoop-9991-v1.txt [ 12613654 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-9594 [ HADOOP-9594 ]
          Steve Loughran made changes -
          Link This issue depends upon MAPREDUCE-5431 [ MAPREDUCE-5431 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-9613 [ HADOOP-9613 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-9611 [ HADOOP-9611 ]
          Steve Loughran made changes -
          Link This issue depends upon HDFS-5411 [ HDFS-5411 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-10075 [ HADOOP-10075 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-10076 [ HADOOP-10076 ]
          Steve Loughran made changes -
          Link This issue depends upon MAPREDUCE-5624 [ MAPREDUCE-5624 ]
          Hide
          stack added a comment -

          Steve Loughran Thanks for taking on this masochistic task. Given hadoop surface area, it is hard to figure if a dependency is needed. How about we go radical in trunk since it will be ten years before there is a hadoop3 which should be time enough to figure those dependencies removed that should have been left in. hadoop2 could do w/ a just a purge of at least the just-not-used. I'd like to help out. In hbase we put on a few filters to try and block the plain farcical but god-bless-maven, you have to reproduce this set each time for each hadoop version we build against since no xinclude...

          Show
          stack added a comment - Steve Loughran Thanks for taking on this masochistic task. Given hadoop surface area, it is hard to figure if a dependency is needed. How about we go radical in trunk since it will be ten years before there is a hadoop3 which should be time enough to figure those dependencies removed that should have been left in. hadoop2 could do w/ a just a purge of at least the just-not-used. I'd like to help out. In hbase we put on a few filters to try and block the plain farcical but god-bless-maven, you have to reproduce this set each time for each hadoop version we build against since no xinclude...
          Hide
          Steve Loughran added a comment -

          Stack, HBase's needs are one of the things I'm thinking of, and once you get into the goal of including Hadoop, HBase and accumulo into a project, things get worse to the extent I couldn't even pull in htmlunit: https://github.com/hortonworks/hoya/blob/master/pom.xml

          1. We should really add ZK and avro into the mix here too
          2. I'd love to see stripped down clients -which we could address by having some lean poms that only include the core bits to talk to localfs, hdfs, yarn, MR, all optional client-side bits (avro integration, ftpfs, s3fs, pulled to the side where you can add them if you explicitly want/need them). Yes, this adds more POMs, but downstream it'll be better
          3. I'd like a lot of the cleanup into 2.3
          Show
          Steve Loughran added a comment - Stack, HBase's needs are one of the things I'm thinking of, and once you get into the goal of including Hadoop, HBase and accumulo into a project, things get worse to the extent I couldn't even pull in htmlunit: https://github.com/hortonworks/hoya/blob/master/pom.xml We should really add ZK and avro into the mix here too I'd love to see stripped down clients -which we could address by having some lean poms that only include the core bits to talk to localfs, hdfs, yarn, MR, all optional client-side bits (avro integration, ftpfs, s3fs, pulled to the side where you can add them if you explicitly want/need them). Yes, this adds more POMs, but downstream it'll be better I'd like a lot of the cleanup into 2.3
          Steve Loughran made changes -
          Link This issue is related to BOOKKEEPER-708 [ BOOKKEEPER-708 ]
          Rakesh R made changes -
          Link This issue depends upon HADOOP-10101 [ HADOOP-10101 ]
          Christopher Tubbs made changes -
          Link This issue is related to HADOOP-8793 [ HADOOP-8793 ]
          Hide
          Colin Patrick McCabe added a comment -

          Thanks for taking a look at this task. It's a difficult one.

          One thing I'd like to add is that we now have httpcore-4.2.5 in trunk combined with httpclient-3.1. However, these two jars provide many of the same classes! The reason is because the Apache httpclient library was end-of-lifed and folded into the httpcore project.

          I am concerned about this dependency since it seems like we should be including one or the other, but not both (given that they provide some of the same classes) I tried getting rid of httpclient 3.1, but it is not possible since we use the custom URL class which is implemented there, and which was dropped in httpcore (they advise using java.net.URI instead). Perhaps we could weed out these uses of the custom URI and try dropping the old client?

          Show
          Colin Patrick McCabe added a comment - Thanks for taking a look at this task. It's a difficult one. One thing I'd like to add is that we now have httpcore-4.2.5 in trunk combined with httpclient-3.1. However, these two jars provide many of the same classes! The reason is because the Apache httpclient library was end-of-lifed and folded into the httpcore project. I am concerned about this dependency since it seems like we should be including one or the other, but not both (given that they provide some of the same classes) I tried getting rid of httpclient 3.1, but it is not possible since we use the custom URL class which is implemented there, and which was dropped in httpcore (they advise using java.net.URI instead). Perhaps we could weed out these uses of the custom URI and try dropping the old client?
          Hide
          Steve Loughran added a comment -

          Colin -that should be a separate JIRA

          I'm trying to triage changes

          1. Low risk, no code changes
          2. minor code changes and/or medium risk
          3. major reworks and/or dependencies known to be brittle

          the httpclient stuff is odd as it's actually been pretty reliable -and far better than the java.net code. It's just that there are now two versions in there, which at least don't conflict. What risks/harms is there from leaving it in, other than binary bloat & getting into the classpath of downstream things -which as there aren't any later versions to conflict with, shouldn't be more than an inconvenience downstream.

          Show
          Steve Loughran added a comment - Colin -that should be a separate JIRA I'm trying to triage changes Low risk, no code changes minor code changes and/or medium risk major reworks and/or dependencies known to be brittle the httpclient stuff is odd as it's actually been pretty reliable -and far better than the java.net code. It's just that there are now two versions in there, which at least don't conflict. What risks/harms is there from leaving it in, other than binary bloat & getting into the classpath of downstream things -which as there aren't any later versions to conflict with, shouldn't be more than an inconvenience downstream.
          Steve Loughran made changes -
          Link This issue is related to HADOOP-10100 [ HADOOP-10100 ]
          Colin Patrick McCabe made changes -
          Link This issue is related to HADOOP-10105 [ HADOOP-10105 ]
          Vinayakumar B made changes -
          Link This issue is related to HADOOP-9905 [ HADOOP-9905 ]
          Hide
          Vinayakumar B added a comment -

          Thanks Steve for initiating this major task.
          Adding to above points, there are many duplicate jars in the distribution.
          such as,
          hdfs/lib, mapreduce/lib, tools/lib and yarn/lib contains ~90% of the same jars present in common/lib.

          Is there any specific reason to keep these multiple copies of the jars.?

          Show
          Vinayakumar B added a comment - Thanks Steve for initiating this major task. Adding to above points, there are many duplicate jars in the distribution. such as, hdfs/lib, mapreduce/lib, tools/lib and yarn/lib contains ~90% of the same jars present in common/lib. Is there any specific reason to keep these multiple copies of the jars.?
          Hide
          Steve Loughran added a comment -

          Vinay -I think that's a separate issue, as its packaging in the tar files. Feel free to open a JIRA on it, but I'm only looking at the POM files and their implications for downstream projects

          Show
          Steve Loughran added a comment - Vinay -I think that's a separate issue, as its packaging in the tar files. Feel free to open a JIRA on it, but I'm only looking at the POM files and their implications for downstream projects
          Hide
          Vinayakumar B added a comment -

          Ok Steve. No problem.
          Filed HADOOP-10115 for the packaging duplicate jars issue. Thanks

          Show
          Vinayakumar B added a comment - Ok Steve. No problem. Filed HADOOP-10115 for the packaging duplicate jars issue. Thanks
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-10103 [ HADOOP-10103 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-10103 [ HADOOP-10103 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-10147 [ HADOOP-10147 ]
          Hide
          Steve Loughran added a comment -

          HADOOP-10147 covers the commons-logging update

          Show
          Steve Loughran added a comment - HADOOP-10147 covers the commons-logging update
          Ted Yu made changes -
          Link This issue depends upon MAPREDUCE-5678 [ MAPREDUCE-5678 ]
          Steve Loughran made changes -
          Assignee Steve Loughran [ stevel@apache.org ]
          Arun C Murthy made changes -
          Affects Version/s 2.3.0 [ 12325254 ]
          Affects Version/s 2.4.0 [ 12324587 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-9833 [ HADOOP-9833 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-10384 [ HADOOP-10384 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-9244 [ HADOOP-9244 ]
          Steve Loughran made changes -
          Link This issue is related to HADOOP-9961 [ HADOOP-9961 ]
          Hide
          Steve Loughran added a comment -

          Link to HADOOP-9961 which pushed up a couple of dependencies

          Show
          Steve Loughran added a comment - Link to HADOOP-9961 which pushed up a couple of dependencies
          Steve Loughran made changes -
          Link This issue is related to HADOOP-10530 [ HADOOP-10530 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-9555 [ HADOOP-9555 ]
          Hide
          Steve Loughran added a comment -

          HADOOP-9555 updates ZK to 3.4.6

          Show
          Steve Loughran added a comment - HADOOP-9555 updates ZK to 3.4.6
          Steve Loughran made changes -
          Link This issue is related to HADOOP-10783 [ HADOOP-10783 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-10814 [ HADOOP-10814 ]
          Steve Loughran made changes -
          Link This issue depends upon HDFS-7376 [ HDFS-7376 ]
          Hide
          Steve Loughran added a comment -

          HDFS-7376 proposes upgrading jsch to avoid java7 problems

          Show
          Steve Loughran added a comment - HDFS-7376 proposes upgrading jsch to avoid java7 problems
          Steve Loughran made changes -
          Link This issue relates to HADOOP-11492 [ HADOOP-11492 ]
          Steve Loughran made changes -
          Link This issue is blocked by HADOOP-11755 [ HADOOP-11755 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-11086 [ HADOOP-11086 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-11946 [ HADOOP-11946 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-12064 [ HADOOP-12064 ]
          Steve Loughran made changes -
          Link This issue is related to HADOOP-12232 [ HADOOP-12232 ]
          Steve Loughran made changes -
          Link This issue depends upon HADOOP-12281 [ HADOOP-12281 ]
          Hide
          Prasanth Jayachandran added a comment -

          Steve Loughran With HIVE-11304 I am trying to migrate to log4j2.x from log4j1.x. Any plans for the same in hadoop?

          Show
          Prasanth Jayachandran added a comment - Steve Loughran With HIVE-11304 I am trying to migrate to log4j2.x from log4j1.x. Any plans for the same in hadoop?
          Hide
          Steve Loughran added a comment -

          Hadoop only explicitly gets at log4j in some tests, so code-wise, excluding those, switching to log4j v 2 is technically a matter of changing the mvn version.

          Except

          • we don't know what happens downstream
          • logging is essential to all services
          • rolling logs of YARN services matters too.

          For those reasons, it's hard to see enthusiasm for a change. Especially as async logging has a price: when things fail the log may not be complete.

          What is possible is to move the hadoop logging to slf4j off commons logging; that's something that we'd like to see in new code, even though migrating old code is TBD. Do that and switching to log4j or even logback can be done by changing the slf4j dependency

          Show
          Steve Loughran added a comment - Hadoop only explicitly gets at log4j in some tests, so code-wise, excluding those, switching to log4j v 2 is technically a matter of changing the mvn version. Except we don't know what happens downstream logging is essential to all services rolling logs of YARN services matters too. For those reasons, it's hard to see enthusiasm for a change. Especially as async logging has a price: when things fail the log may not be complete. What is possible is to move the hadoop logging to slf4j off commons logging; that's something that we'd like to see in new code, even though migrating old code is TBD. Do that and switching to log4j or even logback can be done by changing the slf4j dependency

            People

            • Assignee:
              Steve Loughran
              Reporter:
              Steve Loughran
            • Votes:
              3 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

              • Created:
                Updated:

                Development