Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13016

reinstate hadoop-hdfs as dependency of hadoop-client, create hadoop-lean-client for minimal deployments

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.8.0
    • None
    • build
    • None

    Description

      the split of hadoop-hdfs and hadoop-hdfs-client is breaking code of mine whose builds declared a dependency on hadoop-client and expected all of HDFS to make it in.

      I'm finding this first, because I'm building and testing downstream code against branch-2; I find myself having to explicitly declare a dependency on hadoop-hdfs to make things work again.

      We've also seen problems downstream (e.g. spark) where the move of s3n classes to hadoop-aws has broken code which expects it to be there.

      At the same time, I see the merits in a lean, low-dependency client, which hadoop-client and its dependencies is not today.

      I propose

      1. reinstate hadoop-hdfs as dependency of hadoop-client
      2. add hadoop-aws as a dependency of hadoop-client —but excluding adding any amazon-aws JARs.
      3. create hadoop-lean-client for minimal deployments, stripping out all extraneous dependencies,
      4. for hadoop-lean-client, have a compatibility statement of "we will strip out anything we can from this, even over point releases". That is, anything that can be dropped in future, will.

      This will give downstream projects a choice: the old POM with everything, the lean POM for new apps.

      And, by reinstating hadoop-hdfs, things will build again

      Attachments

        Activity

          People

            Unassigned Unassigned
            stevel@apache.org Steve Loughran
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: