Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-8009

Create hadoop-client and hadoop-minicluster artifacts for downstream projects

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.0.0, 0.22.0, 0.23.0, 0.23.1, 0.24.0
    • 1.0.1, 0.23.1
    • build
    • None
    • Hide
      Generate integration artifacts "org.apache.hadoop:hadoop-client" and "org.apache.hadoop:hadoop-minicluster" containing all the jars needed to use Hadoop client APIs, and to run Hadoop MiniClusters, respectively. Push these artifacts to the maven repository when mvn-deploy, along with existing artifacts.
      Show
      Generate integration artifacts "org.apache.hadoop:hadoop-client" and "org.apache.hadoop:hadoop-minicluster" containing all the jars needed to use Hadoop client APIs, and to run Hadoop MiniClusters, respectively. Push these artifacts to the maven repository when mvn-deploy, along with existing artifacts.

    Description

      Using Hadoop from projects like Pig/Hive/Sqoop/Flume/Oozie or any in-house system that interacts with Hadoop is quite challenging for the following reasons:

      • Different versions of Hadoop produce different artifacts: Before Hadoop 0.23 there was a single artifact hadoop-core, starting with Hadoop 0.23 there are several (common, hdfs, mapred*, yarn*)
      • There are no 'client' artifacts: Current artifacts include all JARs needed to run the services, thus bringing into clients several JARs that are not used for job submission/monitoring (servlet, jsp, tomcat, jersey, etc.)
      • Doing testing on the client side is also quite challenging as more artifacts have to be included than the dependencies define: for example, the history-server artifact has to be explicitly included. If using Hadoop 1 artifacts, jersey-server has to be explicitly included.
      • 3rd party dependencies change in Hadoop from version to version: This makes things complicated for projects that have to deal with multiple versions of Hadoop as their exclusions list become a huge mix & match of artifacts from different Hadoop versions and it may be break things when a particular version of Hadoop requires a dependency that other version of Hadoop does not require.

      Because of this it would be quite convenient to have the following 'aggregator' artifacts:

      • org.apache.hadoop:hadoop-client : it includes all required JARs to use Hadoop client APIs (excluding all JARs that are not needed for it)
      • org.apache.hadoop:hadoop-minicluster : it includes all required JARs to run Hadoop Mini Clusters

      These aggregator artifacts would be created for current branches under development (trunk, 0.22, 0.23, 1.0) and for released versions that are still in use.

      For branches under development, these artifacts would be generated as part of the build.

      For released versions we would have a a special branch used only as vehicle for publishing the corresponding 'aggregator' artifacts.

      Attachments

        1. HADOOP-8009.patch
          14 kB
          Alejandro Abdelnur
        2. HADOOP-8009-existing-releases.patch
          27 kB
          Alejandro Abdelnur
        3. HADOOP-8009-branch-1.patch
          14 kB
          Alejandro Abdelnur
        4. HADOOP-8009-branch-1-add.patch
          0.8 kB
          Matthew Foley
        5. HADOOP-8009-branch-0_22.patch
          16 kB
          Alejandro Abdelnur

        Issue Links

          Activity

            People

              tucu00 Alejandro Abdelnur
              tucu00 Alejandro Abdelnur
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: