Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23534 Spark run on Hadoop 3.0.0
  3. SPARK-23807

Add Hadoop 3 profile with relevant POM fix ups

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0
    • 2.4.0
    • Build
    • None

    Description

      Hadoop 3, and particular Hadoop 3.1 adds:

      • Java 8 as the minimum (and currently sole) supported Java version
      • A new "hadoop-cloud-storage" module intended to be a minimal dependency POM for all the cloud connectors in the version of hadoop built against
      • The ability to declare a committer for any FileOutputFormat which supercedes the classic FileOutputCommitter -in both a job and for a specific FS URI
      • A shaded client JAR, though not yet one complete enough for spark.
      • Lots of other features and fixes.

      The basic work of building spark with hadoop 3 is one of just doing the build with -Dhadoop.version=3.x.y; however that

      • Doesn't build on SBT (dependency resolution of zookeeper JAR)
      • Misses the new cloud features

      The ZK dependency can be fixed everywhere by explicitly declaring the ZK artifact, instead of relying on curator to pull it in; this needs a profile to declare the right ZK version, obviously..

      To use the cloud features spark the hadoop-3 profile should declare that the spark-hadoop-cloud module depends on —and only on— the hadoop/hadoop-cloud-storage module for its transitive dependencies on cloud storage, and a source package which is only built and tested when build against Hadoop 3.1+

       

      Attachments

        Issue Links

          Activity

            People

              stevel@apache.org Steve Loughran
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: