Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23710

Upgrade the built-in Hive to 2.3.5 for hadoop-3.2

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.4.0
    • 3.0.0
    • SQL
    • None

    Description

      Spark fail to run on Hadoop 3.x, because Hive's ShimLoader considers Hadoop 3.x to be an unknown Hadoop version. see SPARK-18673 and HIVE-16081 for more details. So we need to upgrade the built-in Hive for Hadoop-3.x. This is an umbrella JIRA to track this upgrade.

       

      Upgrade Plan:

      1. SPARK-27054 Remove the Calcite dependency. This can avoid some jar conflicts.
      2. SPARK-23749 Replace built-in Hive API (isSub/toKryo) and remove OrcProto.Type usage
      3. SPARK-27158, SPARK-27130 Update dev/* to support dynamic change profiles when testing
      4. Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and compile passed on Hive 2.3.4
      5. Add an empty hive-thriftserverV2 module. then we could test all test cases in next step
      6. Make Hadoop-3.1 with Hive 2.3.4 test passed
      7. Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's TCLIService.thrift

       

      I have completed the initial work and plan to finish this upgrade step by step.
       

       

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              yumwang Yuming Wang
              yumwang Yuming Wang
              Votes:
              13 Vote for this issue
              Watchers:
              43 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: