Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5369

Annotate hive operator tree with statistics from metastore

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Currently the statistics gathered at table/partition level and column level are not used during query planning stage. Statistics at table/partition and column level can be used for optimizing the query plans. Basic statistics like uncompressed data size can be used for better reducer estimation. Other statistics like number of rows, distinct values of columns, average length of columns etc. can be used by Cost Based Optimizer (CBO) for making better query plan selection. As a first step in improving query planning the statistics that are available in the metastore should be attached to hive operator tree. The operator tree should be walked and annotated with statistics information. The attached statistics will vary for each operator depending on the operation it performs. For example, select operator will change the average row size but doesn't affect the number of rows. Similarly filter operator will change the number of rows but doesn't change the average row size. Similar rules can be applied for other operators as well.

      Rules for different operators are added as comments in the code. For more detailed information, the reference book that I am using is "Database Systems: The Complete Book" by Garcia-Molina et.al.

      Attachments

        1. HIVE-5369.1.txt
          750 kB
          Prasanth Jayachandran
        2. HIVE-5369.10.patch
          1.29 MB
          Prasanth Jayachandran
        3. HIVE-5369.2.patch.txt
          725 kB
          Prasanth Jayachandran
        4. HIVE-5369.2.WIP.txt
          874 kB
          Prasanth Jayachandran
        5. HIVE-5369.3.patch.txt
          718 kB
          Prasanth Jayachandran
        6. HIVE-5369.4.patch.txt
          796 kB
          Prasanth Jayachandran
        7. HIVE-5369.5.patch.txt
          800 kB
          Prasanth Jayachandran
        8. HIVE-5369.6.patch.txt
          803 kB
          Prasanth Jayachandran
        9. HIVE-5369.7.patch.txt
          1.23 MB
          Prasanth Jayachandran
        10. HIVE-5369.8.patch.txt
          1.27 MB
          Prasanth Jayachandran
        11. HIVE-5369.9.patch
          1.29 MB
          Gunther Hagleitner
        12. HIVE-5369.9.patch.txt
          1.29 MB
          Prasanth Jayachandran
        13. HIVE-5369.refactor.WIP.txt
          700 kB
          Prasanth Jayachandran
        14. HIVE-5369.WIP.txt
          146 kB
          Prasanth Jayachandran

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            prasanth_j Prasanth Jayachandran Assign to me
            prasanth_j Prasanth Jayachandran
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment