Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16410

DataFrameWriter's jdbc method drops table in overwrite mode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.4.1, 1.6.2
    • None
    • SQL
    • None
    • Important

    Description

      According to the API documentation, the write mode overwrite should overwrite the existing data, which suggests that the data is removed, i.e. the table is truncated.

      However, that is now what happens in the source code:

      if (mode == SaveMode.Overwrite && tableExists) {
              JdbcUtils.dropTable(conn, table)
              tableExists = false
            }
      

      This clearly shows that the table is first dropped and then recreated. This causes two major issues:

      • Existing indexes, partitioning schemes, etc. are completely lost.
      • The case of identifiers may be changed without the user understanding why.

      In my opinion, the table should be truncated, not dropped. Overwriting data is a DML operation and should not cause DDL.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hellstorm Ian Hellstrom
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: