Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42537

Remove obsolete/superfluous imports in spark-hadoop-cloud module

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.4.0
    • None
    • Build
    • None

    Description

      The explicit imports into hadoop-cloud are obsolete

      • the hadoop-cloud-storage pom is a cut down export of the bindings to the various cloud stores in their hadoop-* modules
      • it's been shipping since hadoop 2.10
      • its grown to include cos and allyun support
      • fairly well tested
      • actually cuts removed support (hadoop-openstack) when withdrawn. Hadoop 3.3.5 has done this, leaving a stub jar there just to avoid breaking downstream builds like spark's current setup.

      hadoop-cloud-storage should be all that's needed.

      I know that the spark hadoop-2 profile still references the (long unsupported 2.7.x), but if you are using those releases then really you aren't going to talk to cloud infra

      • no abfs connector
      • s3n connector which won't authenticate with any of the aws regions launched in the past 5-8 years
      • gcs connector won't work (its java11+; hadoop 3.2.x is minimum for java11 clients)
      • none of the new chinese cloud services
      • s3a connector very outdated.
      • s3a connector using unshaded aws client which is unlikely to work with versions of jackson, httpclient written in the last 5 years, has trouble on java8 etc.

      Proposed

      • hadoop-2 profile to be the minimal hadoop-aws and hadoop-azure dependencies in the code today. cutting to the empty set would be better, but a bit more radical
      • hadoop-3 profile to pull in hadoop-cloud-storage (excluding aws sdk as today), and nothing else

      This will simplify everyone's life as there are fewer dependencies to reconcile.

      see also SPARK-39969 proposing making the hadoop-aws versions of the aws-sdk-bundle the normative one, as it is now newer than the spark-kinesis import and more broadly tested

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stevel@apache.org Steve Loughran
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: