Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23525

ALTER TABLE CHANGE COLUMN COMMENT doesn't work for external hive table

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0, 2.3.0
    • Fix Version/s: 2.2.2, 2.3.1, 2.4.0
    • Component/s: SQL
    • Labels:
      None

      Description

      print(spark.sql("""
      SHOW CREATE TABLE test.trends
      """).collect()[0].createtab_stmt)
      
      /// OUTPUT
      CREATE EXTERNAL TABLE `test`.`trends`(`id` string COMMENT '', `metric` string COMMENT '', `amount` bigint COMMENT '')
      COMMENT ''
      PARTITIONED BY (`date` string COMMENT '')
      ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
      WITH SERDEPROPERTIES (
        'serialization.format' = '1'
      )
      STORED AS
        INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
        OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
      LOCATION 's3://xxxxx/xxxxx/xxxx'
      TBLPROPERTIES (
        'transient_lastDdlTime' = '1519729384',
        'last_modified_time' = '1519645652',
        'last_modified_by' = 'pavlo',
        'last_castor_run_ts' = '1513561658.0'
      )
      
      
      spark.sql("""
      DESCRIBE test.trends
      """).collect()
      
      // OUTPUT
      [Row(col_name='id', data_type='string', comment=''),
       Row(col_name='metric', data_type='string', comment=''),
       Row(col_name='amount', data_type='bigint', comment=''),
       Row(col_name='date', data_type='string', comment=''),
       Row(col_name='# Partition Information', data_type='', comment=''),
       Row(col_name='# col_name', data_type='data_type', comment='comment'),
       Row(col_name='date', data_type='string', comment='')]
      
      
      spark.sql("""alter table test.trends change column id id string comment 'unique identifier'""")
      
      
      spark.sql("""
      DESCRIBE test.trends
      """).collect()
      
      // OUTPUT
      [Row(col_name='id', data_type='string', comment=''), Row(col_name='metric', data_type='string', comment=''), Row(col_name='amount', data_type='bigint', comment=''), Row(col_name='date', data_type='string', comment=''), Row(col_name='# Partition Information', data_type='', comment=''), Row(col_name='# col_name', data_type='data_type', comment='comment'), Row(col_name='date', data_type='string', comment='')]
      

      The strange is that I've assigned comment to the id field from hive successfully, and it's visible in Hue UI, but it's still not visible in from spark, and any spark requests doesn't have effect on the comments.

       

        Attachments

          Activity

            People

            • Assignee:
              jiangxb1987 Xingbo Jiang
              Reporter:
              pavlo.skliar Pavlo Skliar
              Shepherd:
              Xingbo Jiang
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: