Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2497

partition pruning prune some right partition under specific conditions

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.1
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      create table src3(key string, value string) partitioned by (pt string)
      row format delimited fields terminated by ',';

      ALTER TABLE src3 ADD IF NOT EXISTS PARTITION (pt='20110911000000') ;
      ALTER TABLE src3 ADD IF NOT EXISTS PARTITION (pt='20110912000000') ;
      ALTER TABLE src3 ADD IF NOT EXISTS PARTITION (pt='20110913000000') ;

      explain extended
      select user_id
      from
      (
      select
      cast(key as int) as user_id
      ,case when (value like 'aaa%' or value like 'vvv%')
      then 1
      else 0 end as tag_student
      from src3
      ) sub
      where sub.tag_student > 0;

      STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-0 is a root stage

      STAGE PLANS:
      Stage: Stage-1
      Map Reduce
      Alias -> Map Operator Tree:
      sub:src3
      TableScan
      alias: src3
      Filter Operator
      isSamplingPred: false
      predicate:
      expr: (CASE WHEN (((value like 'aaa%') or (value like 'vvv%'))) THEN (1) ELSE (0) END > 0)
      type: boolean
      Select Operator
      expressions:
      expr: UDFToInteger(key)
      type: int
      expr: CASE WHEN (((value like 'aaa%') or (value like 'vvv%'))) THEN (1) ELSE (0) END
      type: int
      outputColumnNames: _col0, _col1
      Filter Operator
      isSamplingPred: false
      predicate:
      expr: (_col1 > 0)
      type: boolean
      Select Operator
      expressions:
      expr: _col0
      type: int
      outputColumnNames: _col0
      File Output Operator
      compressed: false
      GlobalTableId: 0
      directory: hdfs://localhost:54310/tmp/hive-tianzhao/hive_2011-10-11_19-26-12_894_9085644225727185586/-ext-10001
      NumFilesPerFileSink: 1
      table:
      input format: org.apache.hadoop.mapred.TextInputFormat
      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      properties:
      columns _col0
      columns.types int
      serialization.format 1
      TotalFiles: 1
      MultiFileSpray: false
      Needs Tagging: false

      Stage: Stage-0
      Fetch Operator
      limit: -1

      if we set hive.optimize.ppd=false;

      STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-0 is a root stage

      STAGE PLANS:
      Stage: Stage-1
      Map Reduce
      Alias -> Map Operator Tree:
      sub:src3
      TableScan
      alias: src3
      Select Operator
      expressions:
      expr: UDFToInteger(key)
      type: int
      expr: CASE WHEN (((value like 'aaa%') or (value like 'vvv%'))) THEN (1) ELSE (0) END
      type: int
      outputColumnNames: _col0, _col1
      Filter Operator
      isSamplingPred: false
      predicate:
      expr: (_col1 > 0)
      type: boolean
      Select Operator
      expressions:
      expr: _col0
      type: int
      outputColumnNames: _col0
      File Output Operator
      compressed: false
      GlobalTableId: 0
      directory: hdfs://localhost:54310/tmp/hive-tianzhao/hive_2011-10-11_19-27-22_527_1729287213481398480/-ext-10001
      NumFilesPerFileSink: 1
      table:
      input format: org.apache.hadoop.mapred.TextInputFormat
      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      properties:
      columns _col0
      columns.types int
      serialization.format 1
      TotalFiles: 1
      MultiFileSpray: false
      Needs Tagging: false
      Path -> Alias:
      hdfs://localhost:54310/user/hive/warehouse/src3/pt=20110911000000 [sub:src3]
      hdfs://localhost:54310/user/hive/warehouse/src3/pt=20110912000000 [sub:src3]
      hdfs://localhost:54310/user/hive/warehouse/src3/pt=20110913000000 [sub:src3]
      Path -> Partition:
      hdfs://localhost:54310/user/hive/warehouse/src3/pt=20110911000000
      Partition
      base file name: pt=20110911000000

      1. hive-2497.patch
        1 kB
        binlijin
      2. hive-2497.2.patch
        22 kB
        binlijin

        Activity

        Hide
        hudson Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1032 (See https://builds.apache.org/job/Hive-trunk-h0.21/1032/)
        HIVE-2497 partition pruning prune some right partition under specific conditions
        (binlijin via namit)

        • begin PUBLIC platform impact section -
          Bugzilla: #
        • end platform impact -

        namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188507
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/ExprProcFactory.java
        • /hive/trunk/ql/src/test/queries/clientpositive/ppr_allchildsarenull.q
        • /hive/trunk/ql/src/test/results/clientpositive/ppr_allchildsarenull.q.out
        Show
        hudson Hudson added a comment - Integrated in Hive-trunk-h0.21 #1032 (See https://builds.apache.org/job/Hive-trunk-h0.21/1032/ ) HIVE-2497 partition pruning prune some right partition under specific conditions (binlijin via namit) begin PUBLIC platform impact section - Bugzilla: # end platform impact - namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188507 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/ExprProcFactory.java /hive/trunk/ql/src/test/queries/clientpositive/ppr_allchildsarenull.q /hive/trunk/ql/src/test/results/clientpositive/ppr_allchildsarenull.q.out
        Hide
        namit Namit Jain added a comment -

        Committed. Thanks binlijin

        Show
        namit Namit Jain added a comment - Committed. Thanks binlijin
        Hide
        namit Namit Jain added a comment -

        +1

        Show
        namit Namit Jain added a comment - +1
        Hide
        namit Namit Jain added a comment -

        reviewing right now

        Show
        namit Namit Jain added a comment - reviewing right now
        Hide
        aoxiang binlijin added a comment -

        Ashutosh Chauhan,
        the problem is reproducible on trunk, 0.8 branch, 0.7 branch that set hive.optimize.ppd=true;
        the patch is for the version that uses PredicatePushDown and PartitionPruner, so 0.7 branch, 0.8 branch, and trunk they all can use the patch.

        Show
        aoxiang binlijin added a comment - Ashutosh Chauhan, the problem is reproducible on trunk, 0.8 branch, 0.7 branch that set hive.optimize.ppd=true; the patch is for the version that uses PredicatePushDown and PartitionPruner, so 0.7 branch, 0.8 branch, and trunk they all can use the patch.
        Hide
        ashutoshc Ashutosh Chauhan added a comment -

        Binlijin,
        Is problem reproducible on a)trunk b) 0.8 branch? Is your patch targeted for 0.7 branch?

        Show
        ashutoshc Ashutosh Chauhan added a comment - Binlijin, Is problem reproducible on a)trunk b) 0.8 branch? Is your patch targeted for 0.7 branch?
        Hide
        aoxiang binlijin added a comment -

        Ashutosh Chauhan, I add a test case in hive-2497.2.patch.
        Because I use a version does not have PartitionConditionRemover, having PartitionConditionRemover and not having PartitionConditionRemover the situation is different, so the test case is not the one in above Description.

        Show
        aoxiang binlijin added a comment - Ashutosh Chauhan, I add a test case in hive-2497.2.patch. Because I use a version does not have PartitionConditionRemover, having PartitionConditionRemover and not having PartitionConditionRemover the situation is different, so the test case is not the one in above Description.
        Hide
        ashutoshc Ashutosh Chauhan added a comment -

        Binlijin,
        It will be good to also include a test case in the patch for regression purposes.

        Show
        ashutoshc Ashutosh Chauhan added a comment - Binlijin, It will be good to also include a test case in the patch for regression purposes.
        Hide
        aoxiang binlijin added a comment -

        PredicatePushDown will push the predicate sub.tag_student > 0 down following TableScanOperator. When partition pruning , try to iterate all partitions , the filter condition is :
        case when (null or null)
        then 1
        else 0
        > 0
        The case when return 0, so all partitions are pruned.

        Show
        aoxiang binlijin added a comment - PredicatePushDown will push the predicate sub.tag_student > 0 down following TableScanOperator. When partition pruning , try to iterate all partitions , the filter condition is : case when (null or null) then 1 else 0 > 0 The case when return 0, so all partitions are pruned.
        Hide
        coderplay Min Zhou added a comment -

        We found partition pruning would incorrectly prune some partitions the MR job needs by running a sql like below

         
        select user_id 
        from
        (
          select 
            cast(key as int) as user_id,
            case when (value like 'aaa%' or value like 'vvv%') 
              then 1 
              else 0 
              end as tag_student
          from src3
        ) sub
        where sub.tag_student > 0;
        

        The patch should fix this problem.

        Show
        coderplay Min Zhou added a comment - We found partition pruning would incorrectly prune some partitions the MR job needs by running a sql like below select user_id from ( select cast(key as int) as user_id, case when (value like 'aaa%' or value like 'vvv%') then 1 else 0 end as tag_student from src3 ) sub where sub.tag_student > 0; The patch should fix this problem.

          People

          • Assignee:
            aoxiang binlijin
            Reporter:
            aoxiang binlijin
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development