[SPARK-16410] DataFrameWriter's jdbc method drops table in overwrite mode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.4.1, 1.6.2
Fix Version/s: None
Component/s: SQL
Labels:
None

Flags:

Important

Description

According to the API documentation, the write mode overwrite should overwrite the existing data, which suggests that the data is removed, i.e. the table is truncated.

However, that is now what happens in the source code:

if (mode == SaveMode.Overwrite && tableExists) {
        JdbcUtils.dropTable(conn, table)
        tableExists = false
      }

This clearly shows that the table is first dropped and then recreated. This causes two major issues:

Existing indexes, partitioning schemes, etc. are completely lost.
The case of identifiers may be changed without the user understanding why.

In my opinion, the table should be truncated, not dropped. Overwriting data is a DML operation and should not cause DDL.

Attachments

Issue Links

duplicates

SPARK-16463 Support `truncate` option in Overwrite mode for JDBC DataFrameWriter

Resolved

is duplicated by

SPARK-13699 Spark SQL drops the table in "overwrite" mode while writing into table

Resolved

links to

[Github] Pull Request #14086 (dongjoon-hyun)

Activity

People

Assignee:: Unassigned

Reporter:: Ian Hellstrom

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Jul/16 07:48

Updated:: 24/Jul/16 08:25

Resolved:: 24/Jul/16 08:25