[SPARK-14459] SQL partitioning must match existing tables, but is not checked. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.0.0
Component/s: SQL
Labels:
- release_notes
- releasenotes

Target Version/s:

2.0.0

Description

Writing into partitioned Hive tables has unexpected results because the table's partitioning is not detected and applied during the analysis phase.

For example, if I have two tables, source and partitioned, with the same column types:

CREATE TABLE source (id bigint, data string, part string);
CREATE TABLE partitioned (id bigint, data string) PARTITIONED BY (part string);

// copy from source to partitioned
sqlContext.table("source").write.insertInto("partitioned")

Copying from source to partitioned succeeds, but results in 0 rows. This works if I explicitly partition by adding ...write.partitionBy("part").insertInto(...). This work-around isn't obvious and is prone to error because the partitionBy must match the table's partitioning, though it is not checked.

I think when relations are resolved, the partitioning should be checked and updated if it isn't set.

Attachments

Issue Links

links to

[Github] Pull Request #12239 (rdblue)

Activity

People

Assignee:: Ryan Blue

Reporter:: Ryan Blue

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 07/Apr/16 16:44

Updated:: 17/Jun/16 19:51

Resolved:: 09/May/16 09:08