Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38334 Implement support for DEFAULT values for columns in tables
  3. SPARK-39926

Fix bug in existence DEFAULT value lookups for non-vectorized Parquet scans

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • SQL
    • None

    Description

      How to reproduce:

      set spark.sql.parquet.enableVectorizedReader=false;
      create table t(a int) using parquet;
      insert into t values (42);
      alter table t add column b int default 42;
      insert into t values (43, null);
      select * from t;
      

      This should return two rows:

      (42, 42) and (43, NULL)

      But instead the scan misses the inserted NULL value, and returns the existence DEFAULT value of "42" instead:

      (42, 42) and (43, 42).

       

      This bug happens because the Parquet API calls one of these set* methods in ParquetRowConverter.scala whenever it finds a non-NULL value:

      private class RowUpdater(row: InternalRow, ordinal: Int)
      extends ParentContainerUpdater {
        override def set(value: Any): Unit = row(ordinal) = value
        override def setBoolean(value: Boolean): Unit = row.setBoolean(ordinal, value)
        override def setByte(value: Byte): Unit = row.setByte(ordinal, value)
        override def setShort(value: Short): Unit = row.setShort(ordinal, value)
        override def setInt(value: Int): Unit = row.setInt(ordinal, value)
        override def setLong(value: Long): Unit = row.setLong(ordinal, value)
        override def setDouble(value: Double): Unit = row.setDouble(ordinal, value)
        override def setFloat(value: Float): Unit = row.setFloat(ordinal, value)
      }
       

       
      But it never calls anything like "setNull()" when encountering a NULL value.

      To fix the bug, we need to know how many columns of data were present in each row of the Parquet data, so we can differentiate between a NULL value and a missing column.

      Attachments

        Activity

          People

            dtenedor Daniel
            dtenedor Daniel
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: