Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38976

spark-sql. overwrite. hive table-duplicate records

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 3.2.1
    • None
    • Spark Core
    • None

    Description

      It occured duplicate records when spark-sql overwrite hive table . when spark job has failure stages,but dateframe has no duplicate records? when I run the job again, the reasult is correct.It confused me.why?

      eg: dataFrame.write().mode(SaveMode.Overwrite).insertInto("outputTable");

      no duplicate records in dataFrame, but duplicate records existed in hive outputTable

      Attachments

        Activity

          People

            Unassigned Unassigned
            wesharn wesharn
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: