Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-27725

Remove redundant columns in TAB_COL_STATS and PART_COL_STATS

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0, 4.0.0-beta-1
    • 4.1.0
    • Hive

    Description

      TAB_COL_STATS table includes CAT_NAME, DB_NAME and TABLE_NAME, which can be fetched by join TBLS and DBS tables on TBL_ID and DB_ID columns.

      PART_COL_STATS table includes CAT_NAME, DB_NAME, TABLE_NAME and PARTITION_NAME, which can be fetched by join PARTITIONS, TBLS and DBS tables on PART_ID, TBL_ID and DB_ID.

      In addition, current HMS get table statistics without join other table, while delete table statistics with join TBLS. This inconsistency will result exception if in a corner case where some table column statistics were left when drop table, then the user recreate the table with same name and database name but will get another TBL_ID, in this case user will get the old table column statistics incorrectly. And if user try delete stats fetched by get api, the NoSuchObjectException will be thrown.

      Attachments

        Issue Links

          Activity

            People

              wechar Wechar
              wechar Wechar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: