Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11160 Auto-gather column stats
  3. HIVE-18149

Stats: rownum estimation from datasize underestimates in most cases

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • Statistics
    • None

    Description

      rownum estimation is based on the following fact as of now:

      • datasize being used from the following sources:
        • basicstats aggregates the loaded "on-heap" row sizes ; other readers are able to give "raw size" estimation - I've checked orc; but I'm sure others will do the same....api docs are a bit vague about the methods purpose...
        • if the basicstats level info is not available; the filesystem level "file-size-sums" are used as the "raw data size" ; which is multiplied by the deserialization ratio ; which is currently 1.

      the problem with all of this is that deser factor is 1; and that rowsize counts in the online object headers..

      example; 20 rows are loaded into a partition columnstats_partlvl_dp.q

      after HIVE-18108 this explain will estimate the rowsize of the table to be 404 bytes; however the 20 rows of text is only 169 bytes...so it ends up with 0 rows...

      Attachments

        1. HIVE-18149.01wip01.patch
          36 kB
          Zoltan Haindrich
        2. HIVE-18149.01.patch
          2.58 MB
          Zoltan Haindrich
        3. HIVE-18149.02.patch
          2.60 MB
          Zoltan Haindrich
        4. HIVE-18149.03wip01.patch
          2.61 MB
          Zoltan Haindrich
        5. HIVE-18149.03wip02.patch
          2.61 MB
          Zoltan Haindrich
        6. HIVE-18149.03.patch
          2.64 MB
          Zoltan Haindrich

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kgyrtkirk Zoltan Haindrich Assign to me
            kgyrtkirk Zoltan Haindrich
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment