Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23710

Upgrade the built-in Hive to 2.3.5 for hadoop-3.2

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Umbrella
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      Spark fail to run on Hadoop 3.x, because Hive's ShimLoader considers Hadoop 3.x to be an unknown Hadoop version. see SPARK-18673 and HIVE-16081 for more details. So we need to upgrade the built-in Hive for Hadoop-3.x. This is an umbrella JIRA to track this upgrade.

       

      Upgrade Plan:

      1. SPARK-27054 Remove the Calcite dependency. This can avoid some jar conflicts.
      2. SPARK-23749 Replace built-in Hive API (isSub/toKryo) and remove OrcProto.Type usage
      3. SPARK-27158, SPARK-27130 Update dev/* to support dynamic change profiles when testing
      4. Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and compile passed on Hive 2.3.4
      5. Add an empty hive-thriftserverV2 module. then we could test all test cases in next step
      6. Make Hadoop-3.1 with Hive 2.3.4 test passed
      7. Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's TCLIService.thrift

       

      I have completed the initial work and plan to finish this upgrade step by step.
       

       

        Attachments

        Issue Links

        1.
        Remove Calcite dependency Sub-task Resolved Yuming Wang Actions
        2.
        Replace built-in Hive API (isSub/toKryo) and remove OrcProto.Type usage Sub-task Resolved Yuming Wang Actions
        3.
        Automatically select profile when executing sbt-checkstyle Sub-task Resolved Yuming Wang Actions
        4.
        dev/mima and dev/scalastyle support dynamic profiles Sub-task Resolved Yuming Wang Actions
        5.
        Reduce the code duplicate when upgrading built-in Hive Sub-task Resolved Yuming Wang Actions
        6.
        Move the conflict source code of the sql/core module to sql/core/v1.2.1 Sub-task Resolved Yuming Wang Actions
        7.
        Dealing with TimeVars removed in Hive 2.x Sub-task Resolved Yuming Wang Actions
        8.
        Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4 Sub-task Resolved Yuming Wang Actions
        9.
        Upgrade hadoop-3 to 3.2.0 Sub-task Resolved Yuming Wang Actions
        10.
        Exclude javax.ws.rs:jsr311-api from hadoop-client Sub-task Resolved Yuming Wang Actions
        11.
        Fix testing issues with yarn module in Hadoop-3 Sub-task Resolved Yuming Wang Actions
        12.
        Avoid using hard-coded jar names in Hive tests Sub-task Resolved Yuming Wang Actions
        13.
        Exclude commons-httpclient when interacting with different versions of the HiveMetastoreClient Sub-task Resolved Unassigned Actions
        14.
        Fix hadoop-3.2 test issue(except the hive-thriftserver module) Sub-task Resolved Yuming Wang Actions
        15.
        Move incompatible code from the hive-thriftserver module to sql/hive-thriftserver/v1.2.1 Sub-task Resolved Yuming Wang Actions
        16.
        Upgrade commons-logging to 1.1.3 Sub-task Resolved Yuming Wang Actions
        17.
        parquet-hadoop-bundle is incorrect in dev/deps/spark-deps-hadoop-3.2 Sub-task Closed Unassigned Actions
        18.
        Exclude com.zaxxer:HikariCP-java7 from hadoop-yarn-server-web-proxy Sub-task Resolved Yuming Wang Actions
        19.
        Beeline should show database in the prompt Sub-task Resolved Unassigned Actions
        20.
        Upgrade to Hive 2.3.5 for Hive Metastore Client and Hadoop-3.2 profile Sub-task Resolved Yuming Wang Actions
        21.
        Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 Sub-task Resolved Unassigned Actions
        22.
        hadoop-3.2 support hive-thriftserver Sub-task Resolved Yuming Wang Actions
        23.
        Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version Sub-task Resolved Yuming Wang Actions
        24.
        Update building-spark.md Sub-task Resolved Yuming Wang Actions
        25.
        Thriftserver throws java.math.BigDecimal incompatible with org.apache.hadoop.hive.common.type.HiveDecimal Sub-task Resolved Yuming Wang Actions
        26.
        Update LICENSE and NOTICE for Hive 2.3 Sub-task Resolved Yuming Wang Actions
        27.
        --jar argument with spark-sql failed to load the jars to driver classpath Sub-task Resolved Sandeep Katta Actions
        28.
        drop database throws Exception Sub-task Resolved Unassigned Actions
        29.
        orc library is incorrect in dev/deps/spark-deps-hadoop-3.2 Sub-task Closed Unassigned Actions
        30.
        dev/deps/spark-deps-hadoop-3.2 orc jar is incorrect Sub-task Resolved angerszhu Actions
        31.
        Hive 2.3 profile should still use orc-nohive Sub-task Closed Unassigned Actions
        32.
        Migration guide for Hive 2.3 Sub-task Resolved wuyi Actions

          Activity

            People

            • Assignee:
              yumwang Yuming Wang
              Reporter:
              yumwang Yuming Wang

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment