Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
Description
Writing into partitioned Hive tables has unexpected results because the table's partitioning is not detected and applied during the analysis phase.
For example, if I have two tables, source and partitioned, with the same column types:
CREATE TABLE source (id bigint, data string, part string); CREATE TABLE partitioned (id bigint, data string) PARTITIONED BY (part string); // copy from source to partitioned sqlContext.table("source").write.insertInto("partitioned")
Copying from source to partitioned succeeds, but results in 0 rows. This works if I explicitly partition by adding ...write.partitionBy("part").insertInto(...). This work-around isn't obvious and is prone to error because the partitionBy must match the table's partitioning, though it is not checked.
I think when relations are resolved, the partitioning should be checked and updated if it isn't set.