Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29295

Duplicate result when dropping partition of an external table and then overwriting

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.3, 2.3.4, 2.4.5
    • 2.4.6, 3.0.0
    • SQL

    Description

      When we drop a partition of a external table and then overwrite it, if we set CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this partition.
      But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate result.

      Here is a reproduce code below(you can add it into SQLQuerySuite in hive module):

        test("spark gives duplicate result when dropping a partition of an external partitioned table" +
          " firstly and they overwrite it") {
          withTable("test") {
            withTempDir { f =>
              sql("create external table test(id int) partitioned by (name string) stored as " +
                s"parquet location '${f.getAbsolutePath}'")
      
              withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> false.toString) {
                sql("insert overwrite table test partition(name='n1') select 1")
                sql("ALTER TABLE test DROP PARTITION(name='n1')")
                sql("insert overwrite table test partition(name='n1') select 2")
                checkAnswer( sql("select id from test where name = 'n1' order by id"),
                  Array(Row(1), Row(2)))
              }
      
              withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> true.toString) {
                sql("insert overwrite table test partition(name='n1') select 1")
                sql("ALTER TABLE test DROP PARTITION(name='n1')")
                sql("insert overwrite table test partition(name='n1') select 2")
                checkAnswer( sql("select id from test where name = 'n1' order by id"),
                  Array(Row(2)))
              }
            }
          }
        }
      
      create external table test(id int) partitioned by (name string) stored as parquet location '/tmp/p';
      set spark.sql.hive.convertMetastoreParquet=false;
      insert overwrite table test partition(name='n1') select 1;
      ALTER TABLE test DROP PARTITION(name='n1');
      insert overwrite table test partition(name='n1') select 2;
      select id from test where name = 'n1' order by id;
      

      Attachments

        Issue Links

          Activity

            People

              viirya L. C. Hsieh
              hzfeiwang feiwang
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: