Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25293

Alter partitioned table with "cascade" option create too many columns records.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 2.3.3, 3.1.2
    • None
    • Metastore

    Description

      When alter partitioned table with "cascade" option, all partitions supports to be updated. Currently, a CD_ID will be created for each partition, associated with a set of Columns, which will cause a large amount of redundant data in the metadata database.

      The following DDL statements can reproduce this scenario:

       

      create table test_table (f1 int) partitioned by (p string);
      alter table test_table add partition(p='a');
      alter table test_table add partition(p='b');
      alter table test_table add partition(p='c');
      alter table test_table add columns (f2 int) cascade;

      All partitions use the table's `CD_ID` before adding columns, while each partition use their own `CD_ID` after adding columns.

       

      My proposal is all partitions should use the same `CD_ID` when table was altered with "cascade" option.

      Attachments

        Activity

          People

            yeats liao yongtaoliao
            yeats liao yongtaoliao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 20m
                1h 20m