Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7040

HDFS dangerously uses @Beta methods from very old versions of Guava

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 2.4.0, 2.5.0, 2.4.1
    • Fix Version/s: None
    • Component/s: None

      Description

      HDFS uses LimitInputStream from Guava. This was introduced as @Beta and is risky for any application to use.

      The problem is further exacerbated by Hadoop's dependency on Guava version 11.0.2, which is quite old for an active project (Feb. 2012).

      Because Guava is very stable, projects which depend on Hadoop and use Guava themselves, can use up through Guava version 14.x

      However, in version 14, Guava deprecated LimitInputStream and provided a replacement. Because they make no guarantees about compatibility about @Beta classes, they removed it in version 15.

      What should be done: Hadoop should updated its dependency on Guava to at least version 14 (currently Guava is on version 19). This should have little impact on users, because Guava is so stable.

      HDFS should then be patched to use the provided alternative to LimitInputStream, so that downstream packagers, users, and application developers requiring more recent versions of Guava (to fix bugs, to use new features, etc.) will be able to swap out the Guava dependency without breaking Hadoop.

      Alternative: While Hadoop cannot predict the marking and removal of deprecated code, it can, and should, avoid the use of @Beta classes and methods that do not offer guarantees. If the dependency cannot be bumped, then it should be relatively trivial to provide an internal class with the same functionality, that does not rely on the older version of Guava.

        Issue Links

          Activity

          Hide
          ctubbsii Christopher Tubbs added a comment -

          Patch attached to remove use of LimitInputStream: 0001-HDFS-7040-Avoid-beta-LimitInputStream-in-Guava.patch for use in 2.4 branch and later. (See MAPREDUCE-6083 for corresponding change in the MAPREDUCE component that affects the client API.)

          Show
          ctubbsii Christopher Tubbs added a comment - Patch attached to remove use of LimitInputStream: 0001-HDFS-7040-Avoid-beta-LimitInputStream-in-Guava.patch for use in 2.4 branch and later. (See MAPREDUCE-6083 for corresponding change in the MAPREDUCE component that affects the client API.)
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12667911/12667911_0001-HDFS-7040-Avoid-beta-LimitInputStream-in-Guava.patch
          against trunk revision 5ec7fcd.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7993//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667911/12667911_0001-HDFS-7040-Avoid-beta-LimitInputStream-in-Guava.patch against trunk revision 5ec7fcd. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7993//console This message is automatically generated.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          This patch needs manual QA testing, as it is intended to be applied to the 2.4 branch.

          Show
          ctubbsii Christopher Tubbs added a comment - This patch needs manual QA testing, as it is intended to be applied to the 2.4 branch.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          This is the long-standing "hadoop branch-2 guava is way out of date" problem, seen in HDFS-5518 and addressed in HADOOP-10101.

          I think Hadoop does have to bite the bullet and update Guava, and think the 2.7 "java 7+ only" release is the time to do it. It may break some downstream code (the argument for holding back), but staying on 11.x is breaking code anyway.

          Christopher .. .is there any easy way to switch off the missing methods in the hadoop code while still having it build against 11.02?

          Show
          stevel@apache.org Steve Loughran added a comment - This is the long-standing "hadoop branch-2 guava is way out of date" problem, seen in HDFS-5518 and addressed in HADOOP-10101 . I think Hadoop does have to bite the bullet and update Guava, and think the 2.7 "java 7+ only" release is the time to do it. It may break some downstream code (the argument for holding back), but staying on 11.x is breaking code anyway. Christopher .. .is there any easy way to switch off the missing methods in the hadoop code while still having it build against 11.02?
          Hide
          ctubbsii Christopher Tubbs added a comment -

          .is there any easy way to switch off the missing methods in the hadoop code while still having it build against 11.02

          Sure, the easiest would be to (re-)implement the LimitInputStream functionality internally within HDFS code. (Minimally, copy the private inner class implementation of ByteStreams.limit() from a newer version, which is called LimitedInputStream. A small note in the NOTICE file to give credit to that project should suffice (it's already Apache License v2.0)). I'd do this myself and provide a patch, but I'm having a very hard time getting Hadoop to build out of the box (I think I need to track down a protobuf dependency).

          That said, I think bumping the dependency is probably pretty low risk (also, this JAPI checker chart I found might help in evaluating risk: http://upstream-tracker.org/java/versions/guava.html).

          Show
          ctubbsii Christopher Tubbs added a comment - .is there any easy way to switch off the missing methods in the hadoop code while still having it build against 11.02 Sure, the easiest would be to (re-)implement the LimitInputStream functionality internally within HDFS code. (Minimally, copy the private inner class implementation of ByteStreams.limit() from a newer version, which is called LimitedInputStream. A small note in the NOTICE file to give credit to that project should suffice (it's already Apache License v2.0)). I'd do this myself and provide a patch, but I'm having a very hard time getting Hadoop to build out of the box (I think I need to track down a protobuf dependency). That said, I think bumping the dependency is probably pretty low risk (also, this JAPI checker chart I found might help in evaluating risk: http://upstream-tracker.org/java/versions/guava.html ).
          Hide
          ctubbsii Christopher Tubbs added a comment -

          To the comment about bumping the Guava version in 2.7, I think that's reasonable, but I really hope something can be done about the MAPREDUCE-6083 for 2.6.0, because that's a much more serious manifestation of this same issue (because it shows up in client code paths, and not just server/MiniDFS code paths). Whatever action is taken for that issue, could be applied here as well, at least in 2.6.0, if not all the way back to 2.4.0.

          Show
          ctubbsii Christopher Tubbs added a comment - To the comment about bumping the Guava version in 2.7, I think that's reasonable, but I really hope something can be done about the MAPREDUCE-6083 for 2.6.0, because that's a much more serious manifestation of this same issue (because it shows up in client code paths, and not just server/MiniDFS code paths). Whatever action is taken for that issue, could be applied here as well, at least in 2.6.0, if not all the way back to 2.4.0.
          Hide
          busbey Sean Busbey added a comment -

          FWIW, HBase currently has a copied reimplementation of LimitedInputStream (since HBASE-9667) for this same reason.

          Show
          busbey Sean Busbey added a comment - FWIW, HBase currently has a copied reimplementation of LimitedInputStream (since HBASE-9667 ) for this same reason.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          I added a patch to fix this under MAPREDUCE-6083 for versions 2.6.0 and later which doesn't change the Guava version dependency. I suppose it could be back-ported to earlier versions (2.4/2.5), but it's probably not worth it since those versions are really only affected by MiniDFSCluster, and that's very limited.

          Show
          ctubbsii Christopher Tubbs added a comment - I added a patch to fix this under MAPREDUCE-6083 for versions 2.6.0 and later which doesn't change the Guava version dependency. I suppose it could be back-ported to earlier versions (2.4/2.5), but it's probably not worth it since those versions are really only affected by MiniDFSCluster , and that's very limited.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12667911/12667911_0001-HDFS-7040-Avoid-beta-LimitInputStream-in-Guava.patch
          against trunk revision 4f18018.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9123//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667911/12667911_0001-HDFS-7040-Avoid-beta-LimitInputStream-in-Guava.patch against trunk revision 4f18018. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9123//console This message is automatically generated.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          This issue is essentially fixed with HADOOP-11286 for 2.6 and later, so there's no reason to keep it open, unless somebody intends to fix versions 2.4 and 2.5, which I would argue is probably not worth the effort. So, I'm going to close it.

          Show
          ctubbsii Christopher Tubbs added a comment - This issue is essentially fixed with HADOOP-11286 for 2.6 and later, so there's no reason to keep it open, unless somebody intends to fix versions 2.4 and 2.5, which I would argue is probably not worth the effort. So, I'm going to close it.

            People

            • Assignee:
              Unassigned
              Reporter:
              ctubbsii Christopher Tubbs
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development