Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-516

Avoid need to import spark-avro package when submitting Hudi job in spark

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.6.0
    • Usability
    • None

    Description

      We are in the process of migrating Hudi to spark 2.4.4 and using spark-avro instead of the deprecated databricks-avro here https://github.com/apache/incubator-hudi/pull/1005/

      After this change, users would be required to specifically download spark-avro while start spark-shell using:

      --packages org.apache.spark:spark-avro_2.11:2.4.4
      

      This is because we are not shading this now in hudi-spark-bundle. One reason for not shading this is because we are not sure of the implications of shading a spark dependency in a jar which is being submitted to spark. vinoth pointed out that a possible concern could be that we will always be shading spark-avro 2.4.4 which can affect users using higher versions of Spark.

      This Jira is to come up with a way to solve this usability issue.

       

      Attachments

        Activity

          People

            lamber-ken lamber-ken
            uditme Udit Mehrotra
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: