Hive
  1. Hive
  2. HIVE-2497

partition pruning prune some right partition under specific conditions

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.1
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      create table src3(key string, value string) partitioned by (pt string)
      row format delimited fields terminated by ',';

      ALTER TABLE src3 ADD IF NOT EXISTS PARTITION (pt='20110911000000') ;
      ALTER TABLE src3 ADD IF NOT EXISTS PARTITION (pt='20110912000000') ;
      ALTER TABLE src3 ADD IF NOT EXISTS PARTITION (pt='20110913000000') ;

      explain extended
      select user_id
      from
      (
      select
      cast(key as int) as user_id
      ,case when (value like 'aaa%' or value like 'vvv%')
      then 1
      else 0 end as tag_student
      from src3
      ) sub
      where sub.tag_student > 0;

      STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-0 is a root stage

      STAGE PLANS:
      Stage: Stage-1
      Map Reduce
      Alias -> Map Operator Tree:
      sub:src3
      TableScan
      alias: src3
      Filter Operator
      isSamplingPred: false
      predicate:
      expr: (CASE WHEN (((value like 'aaa%') or (value like 'vvv%'))) THEN (1) ELSE (0) END > 0)
      type: boolean
      Select Operator
      expressions:
      expr: UDFToInteger(key)
      type: int
      expr: CASE WHEN (((value like 'aaa%') or (value like 'vvv%'))) THEN (1) ELSE (0) END
      type: int
      outputColumnNames: _col0, _col1
      Filter Operator
      isSamplingPred: false
      predicate:
      expr: (_col1 > 0)
      type: boolean
      Select Operator
      expressions:
      expr: _col0
      type: int
      outputColumnNames: _col0
      File Output Operator
      compressed: false
      GlobalTableId: 0
      directory: hdfs://localhost:54310/tmp/hive-tianzhao/hive_2011-10-11_19-26-12_894_9085644225727185586/-ext-10001
      NumFilesPerFileSink: 1
      table:
      input format: org.apache.hadoop.mapred.TextInputFormat
      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      properties:
      columns _col0
      columns.types int
      serialization.format 1
      TotalFiles: 1
      MultiFileSpray: false
      Needs Tagging: false

      Stage: Stage-0
      Fetch Operator
      limit: -1

      if we set hive.optimize.ppd=false;

      STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-0 is a root stage

      STAGE PLANS:
      Stage: Stage-1
      Map Reduce
      Alias -> Map Operator Tree:
      sub:src3
      TableScan
      alias: src3
      Select Operator
      expressions:
      expr: UDFToInteger(key)
      type: int
      expr: CASE WHEN (((value like 'aaa%') or (value like 'vvv%'))) THEN (1) ELSE (0) END
      type: int
      outputColumnNames: _col0, _col1
      Filter Operator
      isSamplingPred: false
      predicate:
      expr: (_col1 > 0)
      type: boolean
      Select Operator
      expressions:
      expr: _col0
      type: int
      outputColumnNames: _col0
      File Output Operator
      compressed: false
      GlobalTableId: 0
      directory: hdfs://localhost:54310/tmp/hive-tianzhao/hive_2011-10-11_19-27-22_527_1729287213481398480/-ext-10001
      NumFilesPerFileSink: 1
      table:
      input format: org.apache.hadoop.mapred.TextInputFormat
      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      properties:
      columns _col0
      columns.types int
      serialization.format 1
      TotalFiles: 1
      MultiFileSpray: false
      Needs Tagging: false
      Path -> Alias:
      hdfs://localhost:54310/user/hive/warehouse/src3/pt=20110911000000 [sub:src3]
      hdfs://localhost:54310/user/hive/warehouse/src3/pt=20110912000000 [sub:src3]
      hdfs://localhost:54310/user/hive/warehouse/src3/pt=20110913000000 [sub:src3]
      Path -> Partition:
      hdfs://localhost:54310/user/hive/warehouse/src3/pt=20110911000000
      Partition
      base file name: pt=20110911000000

      1. hive-2497.2.patch
        22 kB
        binlijin
      2. hive-2497.patch
        1 kB
        binlijin

        Activity

        binlijin created issue -
        binlijin made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        binlijin made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        binlijin made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Labels patch
        binlijin made changes -
        Attachment hive-2497.patch [ 12498699 ]
        Hide
        Min Zhou added a comment -

        We found partition pruning would incorrectly prune some partitions the MR job needs by running a sql like below

         
        select user_id 
        from
        (
          select 
            cast(key as int) as user_id,
            case when (value like 'aaa%' or value like 'vvv%') 
              then 1 
              else 0 
              end as tag_student
          from src3
        ) sub
        where sub.tag_student > 0;
        

        The patch should fix this problem.

        Show
        Min Zhou added a comment - We found partition pruning would incorrectly prune some partitions the MR job needs by running a sql like below select user_id from ( select cast(key as int) as user_id, case when (value like 'aaa%' or value like 'vvv%') then 1 else 0 end as tag_student from src3 ) sub where sub.tag_student > 0; The patch should fix this problem.
        Hide
        binlijin added a comment -

        PredicatePushDown will push the predicate sub.tag_student > 0 down following TableScanOperator. When partition pruning , try to iterate all partitions , the filter condition is :
        case when (null or null)
        then 1
        else 0
        > 0
        The case when return 0, so all partitions are pruned.

        Show
        binlijin added a comment - PredicatePushDown will push the predicate sub.tag_student > 0 down following TableScanOperator. When partition pruning , try to iterate all partitions , the filter condition is : case when (null or null) then 1 else 0 > 0 The case when return 0, so all partitions are pruned.
        Hide
        Ashutosh Chauhan added a comment -

        Binlijin,
        It will be good to also include a test case in the patch for regression purposes.

        Show
        Ashutosh Chauhan added a comment - Binlijin, It will be good to also include a test case in the patch for regression purposes.
        binlijin made changes -
        Attachment hive-2497.2.patch [ 12499264 ]
        Hide
        binlijin added a comment -

        Ashutosh Chauhan, I add a test case in hive-2497.2.patch.
        Because I use a version does not have PartitionConditionRemover, having PartitionConditionRemover and not having PartitionConditionRemover the situation is different, so the test case is not the one in above Description.

        Show
        binlijin added a comment - Ashutosh Chauhan, I add a test case in hive-2497.2.patch. Because I use a version does not have PartitionConditionRemover, having PartitionConditionRemover and not having PartitionConditionRemover the situation is different, so the test case is not the one in above Description.
        Hide
        Ashutosh Chauhan added a comment -

        Binlijin,
        Is problem reproducible on a)trunk b) 0.8 branch? Is your patch targeted for 0.7 branch?

        Show
        Ashutosh Chauhan added a comment - Binlijin, Is problem reproducible on a)trunk b) 0.8 branch? Is your patch targeted for 0.7 branch?
        Hide
        binlijin added a comment -

        Ashutosh Chauhan,
        the problem is reproducible on trunk, 0.8 branch, 0.7 branch that set hive.optimize.ppd=true;
        the patch is for the version that uses PredicatePushDown and PartitionPruner, so 0.7 branch, 0.8 branch, and trunk they all can use the patch.

        Show
        binlijin added a comment - Ashutosh Chauhan, the problem is reproducible on trunk, 0.8 branch, 0.7 branch that set hive.optimize.ppd=true; the patch is for the version that uses PredicatePushDown and PartitionPruner, so 0.7 branch, 0.8 branch, and trunk they all can use the patch.
        John Sichi made changes -
        Assignee binlijin [ aoxiang ]
        Hide
        Namit Jain added a comment -

        reviewing right now

        Show
        Namit Jain added a comment - reviewing right now
        Hide
        Namit Jain added a comment -

        +1

        Show
        Namit Jain added a comment - +1
        Hide
        Namit Jain added a comment -

        Committed. Thanks binlijin

        Show
        Namit Jain added a comment - Committed. Thanks binlijin
        Namit Jain made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1032 (See https://builds.apache.org/job/Hive-trunk-h0.21/1032/)
        HIVE-2497 partition pruning prune some right partition under specific conditions
        (binlijin via namit)

        • begin PUBLIC platform impact section -
          Bugzilla: #
        • end platform impact -

        namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188507
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/ExprProcFactory.java
        • /hive/trunk/ql/src/test/queries/clientpositive/ppr_allchildsarenull.q
        • /hive/trunk/ql/src/test/results/clientpositive/ppr_allchildsarenull.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1032 (See https://builds.apache.org/job/Hive-trunk-h0.21/1032/ ) HIVE-2497 partition pruning prune some right partition under specific conditions (binlijin via namit) begin PUBLIC platform impact section - Bugzilla: # end platform impact - namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188507 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/ExprProcFactory.java /hive/trunk/ql/src/test/queries/clientpositive/ppr_allchildsarenull.q /hive/trunk/ql/src/test/results/clientpositive/ppr_allchildsarenull.q.out
        Carl Steinbach made changes -
        Fix Version/s 0.8.0 [ 12316178 ]
        Carl Steinbach made changes -
        Fix Version/s 0.9.0 [ 12317742 ]
        Carl Steinbach made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            binlijin
            Reporter:
            binlijin
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development