Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23710

Upgrade the built-in Hive to 2.3.5 for hadoop-3.2

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.4.0
    • 3.0.0
    • SQL
    • None

    Description

      Spark fail to run on Hadoop 3.x, because Hive's ShimLoader considers Hadoop 3.x to be an unknown Hadoop version. see SPARK-18673 and HIVE-16081 for more details. So we need to upgrade the built-in Hive for Hadoop-3.x. This is an umbrella JIRA to track this upgrade.

       

      Upgrade Plan:

      1. SPARK-27054 Remove the Calcite dependency. This can avoid some jar conflicts.
      2. SPARK-23749 Replace built-in Hive API (isSub/toKryo) and remove OrcProto.Type usage
      3. SPARK-27158, SPARK-27130 Update dev/* to support dynamic change profiles when testing
      4. Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and compile passed on Hive 2.3.4
      5. Add an empty hive-thriftserverV2 module. then we could test all test cases in next step
      6. Make Hadoop-3.1 with Hive 2.3.4 test passed
      7. Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's TCLIService.thrift

       

      I have completed the initial work and plan to finish this upgrade step by step.
       

       

      Attachments

        Issue Links

          1.
          Remove Calcite dependency Sub-task Resolved Yuming Wang
          2.
          Replace built-in Hive API (isSub/toKryo) and remove OrcProto.Type usage Sub-task Resolved Yuming Wang
          3.
          Automatically select profile when executing sbt-checkstyle Sub-task Resolved Yuming Wang
          4.
          dev/mima and dev/scalastyle support dynamic profiles Sub-task Resolved Yuming Wang
          5.
          Reduce the code duplicate when upgrading built-in Hive Sub-task Resolved Yuming Wang
          6.
          Move the conflict source code of the sql/core module to sql/core/v1.2.1 Sub-task Resolved Yuming Wang
          7.
          Dealing with TimeVars removed in Hive 2.x Sub-task Resolved Yuming Wang
          8.
          Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4 Sub-task Resolved Yuming Wang
          9.
          Upgrade hadoop-3 to 3.2.0 Sub-task Resolved Yuming Wang
          10.
          Exclude javax.ws.rs:jsr311-api from hadoop-client Sub-task Resolved Yuming Wang
          11.
          Fix testing issues with yarn module in Hadoop-3 Sub-task Resolved Yuming Wang
          12.
          Avoid using hard-coded jar names in Hive tests Sub-task Resolved Yuming Wang
          13.
          Exclude commons-httpclient when interacting with different versions of the HiveMetastoreClient Sub-task Resolved Unassigned
          14.
          Fix hadoop-3.2 test issue(except the hive-thriftserver module) Sub-task Resolved Yuming Wang
          15.
          Move incompatible code from the hive-thriftserver module to sql/hive-thriftserver/v1.2.1 Sub-task Resolved Yuming Wang
          16.
          Upgrade commons-logging to 1.1.3 Sub-task Resolved Yuming Wang
          17.
          parquet-hadoop-bundle is incorrect in dev/deps/spark-deps-hadoop-3.2 Sub-task Closed Unassigned
          18.
          Exclude com.zaxxer:HikariCP-java7 from hadoop-yarn-server-web-proxy Sub-task Resolved Yuming Wang
          19.
          Beeline should show database in the prompt Sub-task Resolved Unassigned
          20.
          Upgrade to Hive 2.3.5 for Hive Metastore Client and Hadoop-3.2 profile Sub-task Resolved Yuming Wang
          21.
          Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 Sub-task Resolved Unassigned
          22.
          hadoop-3.2 support hive-thriftserver Sub-task Resolved Yuming Wang
          23.
          Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version Sub-task Resolved Yuming Wang
          24.
          Update building-spark.md Sub-task Resolved Yuming Wang
          25.
          Thriftserver throws java.math.BigDecimal incompatible with org.apache.hadoop.hive.common.type.HiveDecimal Sub-task Resolved Yuming Wang
          26.
          Update LICENSE and NOTICE for Hive 2.3 Sub-task Resolved Yuming Wang
          27.
          --jar argument with spark-sql failed to load the jars to driver classpath Sub-task Resolved Sandeep Katta
          28.
          drop database throws Exception Sub-task Resolved Unassigned
          29.
          orc library is incorrect in dev/deps/spark-deps-hadoop-3.2 Sub-task Closed Unassigned
          30.
          dev/deps/spark-deps-hadoop-3.2 orc jar is incorrect Sub-task Resolved angerszhu
          31.
          Hive 2.3 profile should still use orc-nohive Sub-task Closed Unassigned
          32.
          Migration guide for Hive 2.3 Sub-task Resolved wuyi

          Activity

            People

              yumwang Yuming Wang
              yumwang Yuming Wang
              Votes:
              13 Vote for this issue
              Watchers:
              43 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: