[SPARK-38976] spark-sql. overwrite. hive table-duplicate records - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 3.2.1
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Description

It occured duplicate records when spark-sql overwrite hive table . when spark job has failure stages,but dateframe has no duplicate records? when I run the job again, the reasult is correct.It confused me.why?

eg: dataFrame.write().mode(SaveMode.Overwrite).insertInto("outputTable");

no duplicate records in dataFrame, but duplicate records existed in hive outputTable

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: wesharn

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 21/Apr/22 02:34

Updated:: 12/Dec/22 18:11

Resolved:: 26/Apr/22 03:29