Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6242

Support replace (drop) column for parquet table

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.3.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      SPARK-5528 provides a easy way of support for add column to parquet tables. This is done by using the native parquet capability of merging the schema from all the part-files and _common_metadata files.
      But, if someone wants to drop a column from the parquet table, this still does not work. This happens because, the merged schema shall still show the dropped column, but the column is no more there in metastore. So, the schema's obtained from the two sources do not match, and hence any subsequent query on this table fails.
      Instead of checking for exact match between the two schemas, spark should only check if the schema obtained from metastore is subset of parquet merged schema or not. If this check passes, then the columns present in metastore should be allowed to be referred in the query.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lian cheng Cheng Lian
                Reporter:
                chiragaggarwal chirag aggarwal
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: