Description
Spark fail to run on Hadoop 3.x, because Hive's ShimLoader considers Hadoop 3.x to be an unknown Hadoop version. see SPARK-18673 and HIVE-16081 for more details. So we need to upgrade the built-in Hive for Hadoop-3.x. This is an umbrella JIRA to track this upgrade.
Upgrade Plan:
SPARK-27054Remove the Calcite dependency. This can avoid some jar conflicts.SPARK-23749Replace built-in Hive API (isSub/toKryo) and remove OrcProto.Type usageSPARK-27158,SPARK-27130Update dev/* to support dynamic change profiles when testing- Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and compile passed on Hive 2.3.4
- Add an empty hive-thriftserverV2 module. then we could test all test cases in next step
- Make Hadoop-3.1 with Hive 2.3.4 test passed
- Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's TCLIService.thrift
I have completed the initial work and plan to finish this upgrade step by step.
Attachments
Issue Links
- blocks
-
SPARK-27361 YARN support for GPU-aware scheduling
- Resolved
- is related to
-
SPARK-31113 Support DDL "SHOW VIEWS"
- Resolved
-
SPARK-12014 Spark SQL query containing semicolon is broken in Beeline (related to HIVE-11100)
- Resolved
-
SPARK-24766 CreateHiveTableAsSelect and InsertIntoHiveDir won't generate decimal column stats in parquet
- Resolved
- relates to
-
SPARK-27500 Add tests for built-in Hive 2.3
- Resolved
- supercedes
-
SPARK-27377 Upgrade YARN to 3.1.2+ to support GPU
- Resolved
-
SPARK-24472 Orc RecordReaderFactory throws IndexOutOfBoundsException
- Resolved
-
SPARK-28748 0 as decimal (n , n) in Hive tables shows as NULL in Spark
- Closed
- links to