Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
3.2.1
-
None
-
None
Description
It occured duplicate records when spark-sql overwrite hive table . when spark job has failure stages,but dateframe has no duplicate records? when I run the job again, the reasult is correct.It confused me.why?
eg: dataFrame.write().mode(SaveMode.Overwrite).insertInto("outputTable");
no duplicate records in dataFrame, but duplicate records existed in hive outputTable