Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7060

Column stats give incorrect min and distinct_count

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.13.0
    • 0.14.0
    • Statistics
    • None

    Description

      It seems that the result from column statistics isn't correct on two measures for numeric columns: min (which is always 0) and distinct count. Here is an example:

      select count(distinct avgTimeOnSite), min(avgTimeOnSite) from UserVisits_web_text_none;
      ...
      OK
      9	1
      Time taken: 9.747 seconds, Fetched: 1 row(s)
      

      The statisitics for the column:

      desc formatted UserVisits_web_text_none avgTimeOnSite
      ...
      # col_name              data_type               min                     max                     num_nulls               distinct_count          avg_col_len             max_col_len             num_trues               num_falses              comment
      
      avgTimeOnSite           int                     0                       9                       0                       11                      null                    null                    null               
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              xuefuz Xuefu Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: