[SPARK-18544] Append with df.saveAsTable writes data to wrong location - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.1.0
Component/s: SQL
Labels:
None

Target Version/s:

2.1.0

Description

When using saveAsTable in append mode, data will be written to the wrong location for non-managed Datasource tables. The following example illustrates this.

It seems somehow pass the wrong table path to InsertIntoHadoopFsRelation from DataFrameWriter. Also, we should probably remove the repair table call at the end of saveAsTable in DataFrameWriter. That shouldn't be needed in either the Hive or Datasource case.

scala> spark.sqlContext.range(100).selectExpr("id", "id as A", "id as B").write.partitionBy("A", "B").mode("overwrite").parquet("/tmp/test")

scala> sql("create table test (id long, A int, B int) USING parquet OPTIONS (path '/tmp/test') PARTITIONED BY (A, B)")

scala> sql("msck repair table test")

scala> sql("select * from test where A = 1").count
res6: Long = 1

scala> spark.sqlContext.range(10).selectExpr("id", "id as A", "id as B").write.partitionBy("A", "B").mode("append").saveAsTable("test")

scala> sql("select * from test where A = 1").count
res8: Long = 1

Attachments

Issue Links

links to

[Github] Pull Request #15983 (ericl)

Activity

People

Assignee:: Eric Liang

Reporter:: Eric Liang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Nov/16 21:02

Updated:: 29/Nov/16 05:58

Resolved:: 29/Nov/16 05:57