1. Hive
  2. HIVE-5775 Introduce Cost Based Optimizer to Hive
  3. HIVE-7324

CBO: provide a mechanism to test CBO features based on table stats only (w/o table data)


    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: CBO
    • Labels:


      Since lot of the CBO work is focused on planning, it will be nice to be able to run explain query to test CBO features. TPCDS has a rich enough schema and query set. So the patch loads a dump TPCDS(Scale 10000) stats.

      1. TestCBO shows a way to load stats from a dump and run explain on a tpcds query. The output is currently dumped to Sys.out. This can be improved by hooking to QTestUtil, but hopefully this is a good start.

      2. Uncovered couple of issues in the process of testing this:
      a) PartitionPruner fails on 'true' constants. For e.g. you will get an error for

      SELECT * 
      FROM t WHERE
      partCol < 100 AND true

      This gets exposed because the predicates coming out of Optiq can contain 'true' predicates.
      b) OpTraitsRulesProcFactory:checkBucketedTable checks that number of files = numBuckets. This fails because there are no dataFiles. So I have altered it to catch exceptions and assume bucketMapJoinConvertible = false if an exception is encountered here.
      Uploading with these changes in this patch for now. Will carve them out as separate patches.

      Ashutosh Chauhan, Gunther Hagleitner can you please take a look.

      1. HIVE-7324.2.patch
        9.45 MB
        Gunther Hagleitner
      2. HIVE-7324.1.patch
        9.45 MB
        Harish Butani



          • Assignee:
            Harish Butani
            Harish Butani
          • Votes:
            0 Vote for this issue
            2 Start watching this issue


            • Created: