Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21617

ALTER TABLE...ADD COLUMNS broken in Hive 2.1 for DS tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.2.1, 2.3.0
    • SQL
    • None

    Description

      When you have a data source table and you run a "ALTER TABLE...ADD COLUMNS" query, Spark will save invalid metadata to the Hive metastore.

      Namely, it will overwrite the table's schema with the data frame's schema; that is not desired for data source tables (where the schema is stored in a table property instead).

      Moreover, if you use a newer metastore client where METASTORE_DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES is on by default, you actually get an exception:

      InvalidOperationException(message:The following columns have types incompatible with the existing columns in their respective positions :
      c1)
      	at org.apache.hadoop.hive.metastore.MetaStoreUtils.throwExceptionIfIncompatibleColTypeChange(MetaStoreUtils.java:615)
      	at org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:133)
      	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3704)
      	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:3675)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
      	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
      	at com.sun.proxy.$Proxy26.alter_table_with_environment_context(Unknown Source)
      	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table_with_environmentContext(HiveMetaStoreClient.java:402)
      	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_table_with_environmentContext(SessionHiveMetaStoreClient.java:309)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
      	at com.sun.proxy.$Proxy27.alter_table_with_environmentContext(Unknown Source)
      	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:601)
      

      That exception is handled by Spark in an odd way (see code in HiveExternalCatalog.scala) which still stores invalid metadata.

      Attachments

        Activity

          People

            vanzin Marcelo Masiero Vanzin
            vanzin Marcelo Masiero Vanzin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: