Hive
  1. Hive
  2. HIVE-5775 Introduce Cost Based Optimizer to Hive
  3. HIVE-7324

CBO: provide a mechanism to test CBO features based on table stats only (w/o table data)

    Details

    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: CBO
    • Labels:
      None

      Description

      Since lot of the CBO work is focused on planning, it will be nice to be able to run explain query to test CBO features. TPCDS has a rich enough schema and query set. So the patch loads a dump TPCDS(Scale 10000) stats.

      1. TestCBO shows a way to load stats from a dump and run explain on a tpcds query. The output is currently dumped to Sys.out. This can be improved by hooking to QTestUtil, but hopefully this is a good start.

      2. Uncovered couple of issues in the process of testing this:
      a) PartitionPruner fails on 'true' constants. For e.g. you will get an error for

      SELECT * 
      FROM t WHERE
      partCol < 100 AND true
      

      This gets exposed because the predicates coming out of Optiq can contain 'true' predicates.
      b) OpTraitsRulesProcFactory:checkBucketedTable checks that number of files = numBuckets. This fails because there are no dataFiles. So I have altered it to catch exceptions and assume bucketMapJoinConvertible = false if an exception is encountered here.
      Uploading with these changes in this patch for now. Will carve them out as separate patches.

      Ashutosh Chauhan, Gunther Hagleitner can you please take a look.

      1. HIVE-7324.2.patch
        9.45 MB
        Gunther Hagleitner
      2. HIVE-7324.1.patch
        9.45 MB
        Harish Butani

        Activity

        Damien Carol made changes -
        Component/s CBO [ 12323402 ]
        Gunther Hagleitner made changes -
        Attachment HIVE-7324.2.patch [ 12656906 ]
        Damien Carol made changes -
        Description Since lot of the CBO work is focused on planning, it will be nice to be able to run explain query to test CBO features. TPCDS has a rich enough schema and query set. So the patch loads a dump TPCDS(Scale 10000) stats.

        1. TestCBO shows a way to load stats from a dump and run explain on a tpcds query. The output is currently dumped to Sys.out. This can be improved by hooking to QTestUtil, but hopefully this is a good start.

        2. Uncovered couple of issues in the process of testing this:
        a) PartitionPruner fails on 'true' constants. For e.g. you will get an error for
        {code}
        select * from t where partCol < 100 and true
        {code}
        This gets exposed because the predicates coming out of Optiq can contain 'true' predicates.
        b) OpTraitsRulesProcFactory:checkBucketedTable checks that number of files = numBuckets. This fails because there are no dataFiles. So I have altered it to catch exceptions and assume bucketMapJoinConvertible = false if an exception is encountered here.
        Uploading with these changes in this patch for now. Will carve them out as separate patches.

        [~ashutoshc], [~hagleitn] can you please take a look.

        Since lot of the CBO work is focused on planning, it will be nice to be able to run explain query to test CBO features. TPCDS has a rich enough schema and query set. So the patch loads a dump TPCDS(Scale 10000) stats.

        1. TestCBO shows a way to load stats from a dump and run explain on a tpcds query. The output is currently dumped to Sys.out. This can be improved by hooking to QTestUtil, but hopefully this is a good start.

        2. Uncovered couple of issues in the process of testing this:
        a) PartitionPruner fails on 'true' constants. For e.g. you will get an error for
        {code:sql}
        SELECT *
        FROM t WHERE
        partCol < 100 AND true
        {code}
        This gets exposed because the predicates coming out of Optiq can contain 'true' predicates.
        b) OpTraitsRulesProcFactory:checkBucketedTable checks that number of files = numBuckets. This fails because there are no dataFiles. So I have altered it to catch exceptions and assume bucketMapJoinConvertible = false if an exception is encountered here.
        Uploading with these changes in this patch for now. Will carve them out as separate patches.

        [~ashutoshc], [~hagleitn] can you please take a look.

        Harish Butani made changes -
        Field Original Value New Value
        Attachment HIVE-7324.1.patch [ 12653321 ]
        Harish Butani created issue -

          People

          • Assignee:
            Harish Butani
            Reporter:
            Harish Butani
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development