Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5369 Annotate hive operator tree with statistics from metastore
  3. HIVE-7589

Some fixes and improvements to statistics annotation rules

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.14.0
    • 0.14.0
    • Query Processor, Statistics
    • None

    Description

      FIXES:
      1) JOIN rule does not properly propagate the column statistics from its parent
      2) Multi-way join rule computes the denominator for #rows estimation wrongly
      3) GROUPBY rule does not account for the data size of aggregate column
      4) Prefix removal from column names isn't working
      5) GROUPBY rule looks at missing column statistics for aggregate column from its parent and assumes PARTIAL column stats state

      IMPROVEMENTS:
      1) Replace "EXPLAIN EXTENDED" with "EXPLAIN" in test cases to make the golden files easy to comprehend and to reduce verbosity
      2) Introduce rule for ReduceSink operator which only does renaming of column statistics as per output row schema
      3) Add more rows to the test datasets to avoid 0 row scenario in join test cases
      4) JOIN rule improvement to avoid long overflow

      Attachments

        1. HIVE-7589.1.patch
          662 kB
          Prasanth Jayachandran
        2. HIVE-7589.2.patch
          663 kB
          Prasanth Jayachandran
        3. HIVE-7589.3.patch
          714 kB
          Prasanth Jayachandran
        4. HIVE-7589.4.patch
          713 kB
          Prasanth Jayachandran
        5. HIVE-7589.5.patch
          727 kB
          Prasanth Jayachandran

        Issue Links

          Activity

            People

              prasanth_j Prasanth Jayachandran
              prasanth_j Prasanth Jayachandran
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: