[SPARK-8887] Explicitly define which data types can be used as dynamic partition columns - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4.0, 1.5.0
Fix Version/s: 1.6.0
Component/s: SQL
Labels:
None

Description

InsertIntoHadoopFsRelation implements Hive compatible dynamic partitioning insertion, which uses String.valueOf to write encode partition column values into dynamic partition directories. This actually limits the data types that can be used in partition column. For example, string representation of StructType values is not well defined. However, this limitation is not explicitly enforced.

There are several things we can improve:

Enforce dynamic column data type requirements by adding analysis rules and throws AnalysisException when violation occurs.
Abstract away string representation of various data types, so that we don't need to convert internal representation types (e.g. UTF8String) to external types (e.g. String). A set of Hive compatible implementations should be provided to ensure compatibility with Hive.

Attachments

Issue Links

links to

[Github] Pull Request #8132 (yjshen)

[Github] Pull Request #8201 (yjshen)

Activity

People

Assignee:: Yijie Shen

Reporter:: Cheng Lian

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Jul/15 05:44

Updated:: 10/Dec/15 03:46

Resolved:: 15/Aug/15 04:03