Hive
  1. Hive
  2. HIVE-2344

filter is removed due to regression of HIVE-1538

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      When predicate pushdown is enabled, Hive would previously incorrectly push down predicates on non-deterministic function invocations when those were indirectly referenced via a nested SELECT list rather than directly in the filter expression. After this change, Hive no longer pushes down filters over indirect references to function invocations of any kind (regardless of determinism). Note that in Hive, even builtin operators such as + and CAST are treated as function invocations.
      Show
      When predicate pushdown is enabled, Hive would previously incorrectly push down predicates on non-deterministic function invocations when those were indirectly referenced via a nested SELECT list rather than directly in the filter expression. After this change, Hive no longer pushes down filters over indirect references to function invocations of any kind (regardless of determinism). Note that in Hive, even builtin operators such as + and CAST are treated as function invocations.

      Description

      select * from
      (
      select type_bucket,randum123
      from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
      where randum123 <=0.1)s where s.randum123>0.1 limit 20;

      This is returning results...

      and

      explain
      select type_bucket,randum123
      from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a
      where randum123 <=0.1

      shows that there is no filter.

      1. ppd_udf_col.q.out.txt
        14 kB
        Amareshwari Sriramadasu
      2. hive-patch-2344.txt
        22 kB
        Amareshwari Sriramadasu
      3. hive-patch-2344-2.txt
        25 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
          HIVE-2791: filter is still removed due to regression of HIVE-1538 althougth HIVE-2344 (binlijin via hashutosh) (Revision 1291916)

          Result = ABORTED
          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291916
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
          • /hive/trunk/ql/src/test/queries/clientpositive/ppd2.q
          • /hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-2791 : filter is still removed due to regression of HIVE-1538 althougth HIVE-2344 (binlijin via hashutosh) (Revision 1291916) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291916 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java /hive/trunk/ql/src/test/queries/clientpositive/ppd2.q /hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1268 (See https://builds.apache.org/job/Hive-trunk-h0.21/1268/)
          HIVE-2791: filter is still removed due to regression of HIVE-1538 althougth HIVE-2344 (binlijin via hashutosh) (Revision 1291916)

          Result = SUCCESS
          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291916
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
          • /hive/trunk/ql/src/test/queries/clientpositive/ppd2.q
          • /hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1268 (See https://builds.apache.org/job/Hive-trunk-h0.21/1268/ ) HIVE-2791 : filter is still removed due to regression of HIVE-1538 althougth HIVE-2344 (binlijin via hashutosh) (Revision 1291916) Result = SUCCESS hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291916 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java /hive/trunk/ql/src/test/queries/clientpositive/ppd2.q /hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out
          Hide
          John Sichi added a comment -

          Committed. Thanks Amareshwari!

          Show
          John Sichi added a comment - Committed. Thanks Amareshwari!
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1404/
          -----------------------------------------------------------

          (Updated 2011-08-10 17:06:46.444966)

          Review request for hive, John Sichi and Yongqiang He.

          Changes
          -------

          The filter on 'udf selected as column alias in select' is no more pushed beyond the select.

          Summary (updated)
          -------

          Any filter on 'udf selected as column alias in select' will be pushed down through the select operator, which it should not.

          This addresses bug HIVE-2344.
          https://issues.apache.org/jira/browse/HIVE-2344

          Diffs (updated)


          trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java 1156069
          trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1156069
          trunk/ql/src/test/queries/clientpositive/ppd_udf_col.q PRE-CREATION
          trunk/ql/src/test/results/clientpositive/ppd_udf_col.q.out PRE-CREATION

          Diff: https://reviews.apache.org/r/1404/diff

          Testing (updated)
          -------

          All tests pass with the patch.

          Thanks,

          Amareshwari

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1404/ ----------------------------------------------------------- (Updated 2011-08-10 17:06:46.444966) Review request for hive, John Sichi and Yongqiang He. Changes ------- The filter on 'udf selected as column alias in select' is no more pushed beyond the select. Summary (updated) ------- Any filter on 'udf selected as column alias in select' will be pushed down through the select operator, which it should not. This addresses bug HIVE-2344 . https://issues.apache.org/jira/browse/HIVE-2344 Diffs (updated) trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java 1156069 trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1156069 trunk/ql/src/test/queries/clientpositive/ppd_udf_col.q PRE-CREATION trunk/ql/src/test/results/clientpositive/ppd_udf_col.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1404/diff Testing (updated) ------- All tests pass with the patch. Thanks, Amareshwari
          Hide
          Amareshwari Sriramadasu added a comment -

          Here is a patch doing the change. Not pushing the filter on 'udf as column in select' beyond select.

          All tests passed with the patch.

          Show
          Amareshwari Sriramadasu added a comment - Here is a patch doing the change. Not pushing the filter on 'udf as column in select' beyond select. All tests passed with the patch.
          Hide
          John Sichi added a comment -

          It's possible to avoid the double computation (by pushing the selection down too, similar to column pruning), but I'm fine with skipping that and not pushing the expression beyond the select.

          Show
          John Sichi added a comment - It's possible to avoid the double computation (by pushing the selection down too, similar to column pruning), but I'm fine with skipping that and not pushing the expression beyond the select.
          Hide
          Amareshwari Sriramadasu added a comment -

          Sorry John. Earlier patch has a bug.

          Any other filter on 'udf selected as column alias in select' will also be pushed down always. Do we want to do this?

          More on this: Here, currently the filter (along with udf) is pushed till TableScan. So essentially, we would apply the udf twice for the qualified rows. And it is expensive, if udf is expensive. So, I propose we should not push it beyond the select. Thoughts?

          Show
          Amareshwari Sriramadasu added a comment - Sorry John. Earlier patch has a bug. Any other filter on 'udf selected as column alias in select' will also be pushed down always. Do we want to do this? More on this: Here, currently the filter (along with udf) is pushed till TableScan. So essentially, we would apply the udf twice for the qualified rows. And it is expensive, if udf is expensive. So, I propose we should not push it beyond the select. Thoughts?
          Hide
          John Sichi added a comment -

          Whoops, reassigned to Ido accidentally; reassigning to Amareshwari.

          Show
          John Sichi added a comment - Whoops, reassigned to Ido accidentally; reassigning to Amareshwari.
          Hide
          John Sichi added a comment -

          I'm getting many regression test failures due to EXPLAIN plan changes.

          Show
          John Sichi added a comment - I'm getting many regression test failures due to EXPLAIN plan changes.
          Hide
          John Sichi added a comment -

          +1. Will commit when tests pass.

          Show
          John Sichi added a comment - +1. Will commit when tests pass.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1404/
          -----------------------------------------------------------

          Review request for hive, John Sichi and Yongqiang He.

          Summary
          -------

          Any filter on 'udf selected as column alias in select' will be pushed down through the select operator, which it should not. Patch addresses this by walking through the udf expression again.

          This addresses bug HIVE-2344.
          https://issues.apache.org/jira/browse/HIVE-2344

          Diffs


          trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java 1153812
          trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1153812
          trunk/ql/src/test/queries/clientpositive/ppd_udf_col.q PRE-CREATION
          trunk/ql/src/test/results/clientpositive/ppd_udf_col.q.out PRE-CREATION

          Diff: https://reviews.apache.org/r/1404/diff

          Testing
          -------

          Thanks,

          Amareshwari

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1404/ ----------------------------------------------------------- Review request for hive, John Sichi and Yongqiang He. Summary ------- Any filter on 'udf selected as column alias in select' will be pushed down through the select operator, which it should not. Patch addresses this by walking through the udf expression again. This addresses bug HIVE-2344 . https://issues.apache.org/jira/browse/HIVE-2344 Diffs trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java 1153812 trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1153812 trunk/ql/src/test/queries/clientpositive/ppd_udf_col.q PRE-CREATION trunk/ql/src/test/results/clientpositive/ppd_udf_col.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1404/diff Testing ------- Thanks, Amareshwari
          Hide
          Amareshwari Sriramadasu added a comment -

          Any other filter on 'udf selected as column alias in select' will also be pushed down always. Do we want to do this? Might address in a separate jira.

          Addressing this also in the patch.

          Show
          Amareshwari Sriramadasu added a comment - Any other filter on 'udf selected as column alias in select' will also be pushed down always. Do we want to do this? Might address in a separate jira. Addressing this also in the patch.
          Hide
          Amareshwari Sriramadasu added a comment -

          Any other filter on 'udf selected as column alias in select' will also be pushed down always.

          Attaching test output with faulty explain plans.

          Show
          Amareshwari Sriramadasu added a comment - Any other filter on 'udf selected as column alias in select' will also be pushed down always. Attaching test output with faulty explain plans.
          Hide
          Amareshwari Sriramadasu added a comment -

          The problem is that the select operator chooses to push down the filter 'random123 <0.1', though it is non deterministic. And the filter is discarded to be pushed down since it is non deterministic later. Will upload a patch with the fix shortly.

          Any other filter on 'udf selected as column alias in select' will also be pushed down always. Do we want to do this? Might address in a separate jira.

          Show
          Amareshwari Sriramadasu added a comment - The problem is that the select operator chooses to push down the filter 'random123 <0.1', though it is non deterministic. And the filter is discarded to be pushed down since it is non deterministic later. Will upload a patch with the fix shortly. Any other filter on 'udf selected as column alias in select' will also be pushed down always. Do we want to do this? Might address in a separate jira.
          Hide
          Amareshwari Sriramadasu added a comment -

          Looking into.

          Show
          Amareshwari Sriramadasu added a comment - Looking into.
          Hide
          John Sichi added a comment -

          Workaround is

          set hive.ppd.remove.duplicatefilters=false

          Show
          John Sichi added a comment - Workaround is set hive.ppd.remove.duplicatefilters=false

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              He Yongqiang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development