Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3962

Number of distinct values are wrong in column statistics

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.10.0
    • None
    • Statistics
    • None

    Description

      When we run the query on hive ql src table :

      select count(distinct(key)), count(distinct(value) from src;
      309 309

      After running the following analyze query, the stats in metastore seem wrong:

      analyze table src compute statistics for columns key, value;

      — stats in metastore —

      mysql > select * from TAB_COL_STATS where TABLE_NAME="src";

      CS_ID DB_NAME TABLE_NAME COLUMN_NAME COLUMN_TYPE TBL_ID LONG_LOW_VALUE LONG_HIGH_VALUE DOUBLE_HIGH_VALUE DOUBLE_LOW_VALUE BIG_DECIMAL_LOW_VALUE BIG_DECIMAL_HIGH_VALUE NUM_NULLS NUM_DISTINCTS AVG_COL_LEN MAX_COL_LEN NUM_TRUES NUM_FALSES LAST_ANALYZED
      5 default src key int 11 0 498 0.0000 0.0000 NULL NULL 0 291 0.0000 0 0 0 1359539181
      6 default src value string 11 0 0 0.0000 0.0000 NULL NULL 0 112 6.8120 7 0 0 1359539181

      Attachments

        Activity

          People

            Unassigned Unassigned
            amareshwari Amareshwari Sriramadasu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: