Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: build
    • Labels:
      None

      Description

      In the hadoop libraries you'll find both 11.0.2 (test scope IIRC) and 14.0.1 which are both very outdated. 14.0.1 removes things used in 11.0.2 and 15.0 has removed things in use by Hadoop code in 14.0.1.

      In our experience through CDH3,4 and 5 Guava (along with Jackson and SLF4J 1.7.5) have been the biggest cause for CP issues.

        Issue Links

          Activity

          Hide
          andrew.wang Andrew Wang added a comment -

          This issue has come up many times, but unfortunately this is a dependency we can't bump without breaking compatibility. I've expressed a -1 to changing this in the 2.x line, and I'll reiterate it here. It's definitely, definitely something I want to have fixed for Hadoop 3.0 though.

          Show
          andrew.wang Andrew Wang added a comment - This issue has come up many times, but unfortunately this is a dependency we can't bump without breaking compatibility. I've expressed a -1 to changing this in the 2.x line, and I'll reiterate it here. It's definitely, definitely something I want to have fixed for Hadoop 3.0 though.
          Hide
          timrobertson100 Tim Robertson added a comment -

          Thanks Andrew Wang. Is there anything that documents the compatibility issues? I know Guava have removed some packages, but do you think there might be options such as bumping version, and restoring the removed Guava packages in the Hadoop common project, marking them as deprecated with a removal in 3? Or perhaps you already know it is way more complex than that?

          Show
          timrobertson100 Tim Robertson added a comment - Thanks Andrew Wang . Is there anything that documents the compatibility issues? I know Guava have removed some packages, but do you think there might be options such as bumping version, and restoring the removed Guava packages in the Hadoop common project, marking them as deprecated with a removal in 3? Or perhaps you already know it is way more complex than that?
          Hide
          andrew.wang Andrew Wang added a comment -

          Hey Tim,

          The compat issue I'm thinking of is if an application is depending on the exact Guava 11 being exposed on the Hadoop classpath. If we change the Guava version, this application will break when it tries to run against the new version of Hadoop. Providing our own versions of removed functions isn't bulletproof since an app could be depending on the behavior from our exact version of Guava. If a function's behavior is different in different versions of Guava, we can't expose all of them at once. I also don't think the Hadoop project wants to be in the business of maintaining our own "compatible" version of Guava.

          Basically we're hamstrung since we leak our dependencies. Old apps can depend on our ancient deps, so we have to keep them for compat even though it's quite painful. For Hadoop 3, one option is simply shading all our deps, but that's still under discussion.

          For Hadoop 2, there are some possible mitigations. YARN has some classloader and classpath mangling features which can help, but that doesn't help if you're running outside of a YARN container. You can also try just putting your newer deps at the front of the classpath and hoping for the best, but that's unsupported/untested, and is probably something you've already wrestled with.

          Show
          andrew.wang Andrew Wang added a comment - Hey Tim, The compat issue I'm thinking of is if an application is depending on the exact Guava 11 being exposed on the Hadoop classpath. If we change the Guava version, this application will break when it tries to run against the new version of Hadoop. Providing our own versions of removed functions isn't bulletproof since an app could be depending on the behavior from our exact version of Guava. If a function's behavior is different in different versions of Guava, we can't expose all of them at once. I also don't think the Hadoop project wants to be in the business of maintaining our own "compatible" version of Guava. Basically we're hamstrung since we leak our dependencies. Old apps can depend on our ancient deps, so we have to keep them for compat even though it's quite painful. For Hadoop 3, one option is simply shading all our deps, but that's still under discussion. For Hadoop 2, there are some possible mitigations. YARN has some classloader and classpath mangling features which can help, but that doesn't help if you're running outside of a YARN container. You can also try just putting your newer deps at the front of the classpath and hoping for the best, but that's unsupported/untested, and is probably something you've already wrestled with.
          Hide
          timrobertson100 Tim Robertson added a comment -

          Thanks again, and that is a fair point about maintaining legacy code.

          This is an honest question - when updating to later versions of Hadoop libs (or a minor version in a distribution, such as CDH 5.2.x -> 5.3.x), isn't it expected that applications will have to adapt somewhat? Especially when people are relying on transitive dependencies, might it be enough to coach folks towards using mvn dependency:analyze more rigorously?

          Aside: we've worked through our problems and are about to go to production on Yarn upgrading from MR1 and CDH4.3, but it was not without pain and a fair amount of pom exclusions (logging, and unwanted MR1 things coming in), downgrading libs (Jackson and Guava), setting user CP precedence working around issues like https://issues.apache.org/jira/browse/OOZIE-2066.

          Show
          timrobertson100 Tim Robertson added a comment - Thanks again, and that is a fair point about maintaining legacy code. This is an honest question - when updating to later versions of Hadoop libs (or a minor version in a distribution, such as CDH 5.2.x -> 5.3.x), isn't it expected that applications will have to adapt somewhat? Especially when people are relying on transitive dependencies, might it be enough to coach folks towards using mvn dependency:analyze more rigorously? Aside: we've worked through our problems and are about to go to production on Yarn upgrading from MR1 and CDH4.3, but it was not without pain and a fair amount of pom exclusions (logging, and unwanted MR1 things coming in), downgrading libs (Jackson and Guava), setting user CP precedence working around issues like https://issues.apache.org/jira/browse/OOZIE-2066 .
          Hide
          andrew.wang Andrew Wang added a comment -

          I agree there's a gap in our promised compat guidelines and the reality. Things are supposed to work between minor releases without fighting transitive deps like that, but JAR version bumps sneak in and break things. The latest issues with Jackson in 2.5 are a good example, the version # change was quite minor, but it removed some methods which were used in a lot of downstreams (HBase, Solr, etc...), and we didn't realize until after it was released. That was a lesson learned for everyone, and hopefully something we can avoid in the future.

          Show
          andrew.wang Andrew Wang added a comment - I agree there's a gap in our promised compat guidelines and the reality. Things are supposed to work between minor releases without fighting transitive deps like that, but JAR version bumps sneak in and break things. The latest issues with Jackson in 2.5 are a good example, the version # change was quite minor, but it removed some methods which were used in a lot of downstreams (HBase, Solr, etc...), and we didn't realize until after it was released. That was a lesson learned for everyone, and hopefully something we can avoid in the future.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Tim, we all see the problems, and know that guava is the core troublespot. If they didn't cull things across versions, we'd have moved up the way we slowly upgrade other dependencies.

          Hadoop 2.7 will be java7+ only, and we have an opportunity to move up some of the dependencies (jetty, servlet APIs). Guava is too sensitive though...at best we can make sure that Hadoop works against Guava-latest, even if we still ship 11.x. Help there welcome.

          The other thing under discussion is having a lean hadoop-client lib, but as the HDFS client uses some of the guava data structures, we can't make that a Guava-free library

          closing as a duplicate of the HADOOP-10101 JIRA so that we can keep all this pain in one single palce

          Show
          stevel@apache.org Steve Loughran added a comment - Tim, we all see the problems, and know that guava is the core troublespot. If they didn't cull things across versions, we'd have moved up the way we slowly upgrade other dependencies. Hadoop 2.7 will be java7+ only, and we have an opportunity to move up some of the dependencies (jetty, servlet APIs). Guava is too sensitive though...at best we can make sure that Hadoop works against Guava-latest, even if we still ship 11.x. Help there welcome. The other thing under discussion is having a lean hadoop-client lib, but as the HDFS client uses some of the guava data structures, we can't make that a Guava-free library closing as a duplicate of the HADOOP-10101 JIRA so that we can keep all this pain in one single palce

            People

            • Assignee:
              Unassigned
              Reporter:
              timrobertson100 Tim Robertson
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development