Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16926

Partition columns are present in columns metadata for partition but not table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.1, 2.1.0
    • SQL
    • None

    Description

      A change introduced in SPARK-14388 removes partition columns from the column metadata of tables, but not for partitions. This causes TableReader to believe that the schema is different and create an unnecessary conversion object inspector, taking the else codepath in TableReader below:

          val soi = if (rawDeser.getObjectInspector.equals(tableDeser.getObjectInspector)) {
            rawDeser.getObjectInspector.asInstanceOf[StructObjectInspector]
          } else {
            ObjectInspectorConverters.getConvertedOI(
              rawDeser.getObjectInspector,
              tableDeser.getObjectInspector).asInstanceOf[StructObjectInspector]
          }
      

      Printing the properties as debug output confirms the difference for the Hive table.

      Table properties (tableDesc.getProperties):

      16/08/04 20:36:58 DEBUG HadoopTableReader: columns.types, string:bigint:string:bigint:bigint:array<string>
      

      Partition properties (partProps):

      16/08/04 20:36:58 DEBUG HadoopTableReader: columns.types, string:bigint:string:bigint:bigint:array<string>:string:string:string
      

      Where the final three string columns are partition columns

      Attachments

        Activity

          People

            chobrian Brian Cho
            chobrian Brian Cho
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: