Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5775 Introduce Cost Based Optimizer to Hive
  3. HIVE-7324

CBO: provide a mechanism to test CBO features based on table stats only (w/o table data)

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • CBO
    • None

    Description

      Since lot of the CBO work is focused on planning, it will be nice to be able to run explain query to test CBO features. TPCDS has a rich enough schema and query set. So the patch loads a dump TPCDS(Scale 10000) stats.

      1. TestCBO shows a way to load stats from a dump and run explain on a tpcds query. The output is currently dumped to Sys.out. This can be improved by hooking to QTestUtil, but hopefully this is a good start.

      2. Uncovered couple of issues in the process of testing this:
      a) PartitionPruner fails on 'true' constants. For e.g. you will get an error for

      SELECT * 
      FROM t WHERE
      partCol < 100 AND true
      

      This gets exposed because the predicates coming out of Optiq can contain 'true' predicates.
      b) OpTraitsRulesProcFactory:checkBucketedTable checks that number of files = numBuckets. This fails because there are no dataFiles. So I have altered it to catch exceptions and assume bucketMapJoinConvertible = false if an exception is encountered here.
      Uploading with these changes in this patch for now. Will carve them out as separate patches.

      ashutoshc, hagleitn can you please take a look.

      Attachments

        1. HIVE-7324.2.patch
          9.45 MB
          Gunther Hagleitner
        2. HIVE-7324.1.patch
          9.45 MB
          Harish Butani

        Activity

          People

            rhbutani Harish Butani
            rhbutani Harish Butani
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: