Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11160 Auto-gather column stats
  3. HIVE-18149

Stats: rownum estimation from datasize underestimates in most cases

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • Statistics
    • None

    Description

      rownum estimation is based on the following fact as of now:

      • datasize being used from the following sources:
        • basicstats aggregates the loaded "on-heap" row sizes ; other readers are able to give "raw size" estimation - I've checked orc; but I'm sure others will do the same....api docs are a bit vague about the methods purpose...
        • if the basicstats level info is not available; the filesystem level "file-size-sums" are used as the "raw data size" ; which is multiplied by the deserialization ratio ; which is currently 1.

      the problem with all of this is that deser factor is 1; and that rowsize counts in the online object headers..

      example; 20 rows are loaded into a partition columnstats_partlvl_dp.q

      after HIVE-18108 this explain will estimate the rowsize of the table to be 404 bytes; however the 20 rows of text is only 169 bytes...so it ends up with 0 rows...

      Attachments

        1. HIVE-18149.01.patch
          2.58 MB
          Zoltan Haindrich
        2. HIVE-18149.01wip01.patch
          36 kB
          Zoltan Haindrich
        3. HIVE-18149.02.patch
          2.60 MB
          Zoltan Haindrich
        4. HIVE-18149.03.patch
          2.64 MB
          Zoltan Haindrich
        5. HIVE-18149.03wip01.patch
          2.61 MB
          Zoltan Haindrich
        6. HIVE-18149.03wip02.patch
          2.61 MB
          Zoltan Haindrich

        Issue Links

          Activity

            People

              kgyrtkirk Zoltan Haindrich
              kgyrtkirk Zoltan Haindrich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: