[SPARK-23710] Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Umbrella
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Target Version/s:

3.0.0

Description

Spark fail to run on Hadoop 3.x, because Hive's ShimLoader considers Hadoop 3.x to be an unknown Hadoop version. see ~~SPARK-18673~~ and ~~HIVE-16081~~ for more details. So we need to upgrade the built-in Hive for Hadoop-3.x. This is an umbrella JIRA to track this upgrade.

Upgrade Plan:

~~SPARK-27054~~ Remove the Calcite dependency. This can avoid some jar conflicts.
~~SPARK-23749~~ Replace built-in Hive API (isSub/toKryo) and remove OrcProto.Type usage
~~SPARK-27158~~, ~~SPARK-27130~~ Update dev/* to support dynamic change profiles when testing
Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and compile passed on Hive 2.3.4
Add an empty hive-thriftserverV2 module. then we could test all test cases in next step
Make Hadoop-3.1 with Hive 2.3.4 test passed
Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's TCLIService.thrift

I have completed the initial work and plan to finish this upgrade step by step.

Attachments

Issue Links

blocks

SPARK-27361 YARN support for GPU-aware scheduling

Resolved

is related to

SPARK-31113 Support DDL "SHOW VIEWS"

Resolved

SPARK-12014 Spark SQL query containing semicolon is broken in Beeline (related to HIVE-11100)

Resolved

SPARK-24766 CreateHiveTableAsSelect and InsertIntoHiveDir won't generate decimal column stats in parquet

Resolved

relates to

SPARK-27500 Add tests for built-in Hive 2.3

Resolved

supercedes

SPARK-27377 Upgrade YARN to 3.1.2+ to support GPU

Resolved

SPARK-24472 Orc RecordReaderFactory throws IndexOutOfBoundsException

Resolved

SPARK-28748 0 as decimal (n , n) in Hive tables shows as NULL in Spark

Closed

links to

GitHub Pull Request #23552

GitHub Pull Request #23553

GitHub Pull Request #23788

(3 supercedes, 3 links to)

Sub-Tasks

1.	Remove Calcite dependency	Resolved	Yuming Wang
2.	Replace built-in Hive API (isSub/toKryo) and remove OrcProto.Type usage	Resolved	Yuming Wang
3.	Automatically select profile when executing sbt-checkstyle	Resolved	Yuming Wang
4.	dev/mima and dev/scalastyle support dynamic profiles	Resolved	Yuming Wang
5.	Reduce the code duplicate when upgrading built-in Hive	Resolved	Yuming Wang
6.	Move the conflict source code of the sql/core module to sql/core/v1.2.1	Resolved	Yuming Wang
7.	Dealing with TimeVars removed in Hive 2.x	Resolved	Yuming Wang
8.	Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4	Resolved	Yuming Wang
9.	Upgrade hadoop-3 to 3.2.0	Resolved	Yuming Wang
10.	Exclude javax.ws.rs:jsr311-api from hadoop-client	Resolved	Yuming Wang
11.	Fix testing issues with yarn module in Hadoop-3	Resolved	Yuming Wang
12.	Avoid using hard-coded jar names in Hive tests	Resolved	Yuming Wang
13.	Exclude commons-httpclient when interacting with different versions of the HiveMetastoreClient	Resolved	Unassigned
14.	Fix hadoop-3.2 test issue(except the hive-thriftserver module)	Resolved	Yuming Wang
15.	Move incompatible code from the hive-thriftserver module to sql/hive-thriftserver/v1.2.1	Resolved	Yuming Wang
16.	Upgrade commons-logging to 1.1.3	Resolved	Yuming Wang
17.	parquet-hadoop-bundle is incorrect in dev/deps/spark-deps-hadoop-3.2	Closed	Unassigned
18.	Exclude com.zaxxer:HikariCP-java7 from hadoop-yarn-server-web-proxy	Resolved	Yuming Wang
19.	Beeline should show database in the prompt	Resolved	Unassigned
20.	Upgrade to Hive 2.3.5 for Hive Metastore Client and Hadoop-3.2 profile	Resolved	Yuming Wang
21.	Upgrade the built-in Hive to 2.3.5 for hadoop-3.2	Resolved	Unassigned
22.	hadoop-3.2 support hive-thriftserver	Resolved	Yuming Wang
23.	Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version	Resolved	Yuming Wang
24.	Update building-spark.md	Resolved	Yuming Wang
25.	Thriftserver throws java.math.BigDecimal incompatible with org.apache.hadoop.hive.common.type.HiveDecimal	Resolved	Yuming Wang
26.	Update LICENSE and NOTICE for Hive 2.3	Resolved	Yuming Wang
27.	--jar argument with spark-sql failed to load the jars to driver classpath	Resolved	Sandeep Katta
28.	drop database throws Exception	Resolved	Unassigned
29.	orc library is incorrect in dev/deps/spark-deps-hadoop-3.2	Closed	Unassigned
30.	dev/deps/spark-deps-hadoop-3.2 orc jar is incorrect	Resolved	angerszhu
31.	Hive 2.3 profile should still use orc-nohive	Closed	Unassigned
32.	Migration guide for Hive 2.3	Resolved	wuyi

Activity

People

Assignee:: Yuming Wang

Reporter:: Yuming Wang

Votes:: 13 Vote for this issue

Watchers:: 43 Start watching this issue

Dates

Created:: 16/Mar/18 16:07

Updated:: 12/Dec/22 18:10

Resolved:: 08/Feb/20 05:24