Hive
  1. Hive
  2. HIVE-1376

Simple UDAFs with more than 1 parameter crash on empty row query

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.7.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      Simple UDAFs with more than 1 parameter crash when the query returns no rows. Currently, this only seems to affect the percentile() UDAF where the second parameter is the percentile to be computed (of type double). I've also verified the bug by adding a dummy parameter to ExampleMin in contrib.

      On an empty query, Hive seems to be trying to resolve an iterate() method with signature

      {null,null}

      instead of

      {null,double}

      . You can reproduce this bug using:

      CREATE TABLE pct_test ( val INT );
      SELECT percentile(val, 0.5) FROM pct_test;

      which produces a lot of errors like:

      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double) on object org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator@11d13272 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator with arguments

      {null, null}

      of size 2

      1. HIVE-1376.patch
        19 kB
        Ning Zhang
      2. HIVE-1376.2.patch
        9 kB
        Ning Zhang

        Activity

        Hide
        John Sichi added a comment -

        Some more details on this:

        • In the case of a full-table aggregation (no group by key) where no rows exist (or all get filtered out), the aggregation framework sends a row of all nulls to the aggregator. I don't know why this is necessary, since all of the existing aggregators ignore the null anyway.
        • Since the percentile UDAF uses a primitive double for the parameter type to the iterate method (rather than a Double or a DoubleWritable), Java reflection throws an IllegalArgumentException because it can't convert a null to a primitive.

        There are three possible solutions:

        (1) change percentile to use a non-primitive type

        (2) add more reflection and skip the attempt to send the null to iterate in the case where the parameter type is primitive

        (3) avoid sending the null in the first place (unless someone can explain why it's needed, or some regression test fails when we try it)

        Show
        John Sichi added a comment - Some more details on this: In the case of a full-table aggregation (no group by key) where no rows exist (or all get filtered out), the aggregation framework sends a row of all nulls to the aggregator. I don't know why this is necessary, since all of the existing aggregators ignore the null anyway. Since the percentile UDAF uses a primitive double for the parameter type to the iterate method (rather than a Double or a DoubleWritable), Java reflection throws an IllegalArgumentException because it can't convert a null to a primitive. There are three possible solutions: (1) change percentile to use a non-primitive type (2) add more reflection and skip the attempt to send the null to iterate in the case where the parameter type is primitive (3) avoid sending the null in the first place (unless someone can explain why it's needed, or some regression test fails when we try it)
        Hide
        Zheng Shao added a comment -

        I think (3) makes the most sense. If (3) does not work for whatever hard-to-fix reason, we can do (1).
        In any case, the change should be pretty simple.

        Show
        Zheng Shao added a comment - I think (3) makes the most sense. If (3) does not work for whatever hard-to-fix reason, we can do (1). In any case, the change should be pretty simple.
        Hide
        Ning Zhang added a comment -

        Attaching a patch for review. This patch also fixes HIVE-1674 (count returning wrong results).

        Tests are still running. Will upload a new patch if there are more changes.

        This patch implements 3) as suggest and SELECT PERCENTILE(col, 0.5) from src where false; will return a single row with NULL as value.

        Show
        Ning Zhang added a comment - Attaching a patch for review. This patch also fixes HIVE-1674 (count returning wrong results). Tests are still running. Will upload a new patch if there are more changes. This patch implements 3) as suggest and SELECT PERCENTILE(col, 0.5) from src where false; will return a single row with NULL as value.
        Hide
        Ning Zhang added a comment -

        The previous patch failed on several test, particularly count queries. Attaching a new patch for percentile only and will update a patch for HIVE-1674 separately.

        Show
        Ning Zhang added a comment - The previous patch failed on several test, particularly count queries. Attaching a new patch for percentile only and will update a patch for HIVE-1674 separately.
        Hide
        He Yongqiang added a comment -

        will take a look.

        Show
        He Yongqiang added a comment - will take a look.
        Hide
        He Yongqiang added a comment -

        the patch looks good. is there the same problem in other udafs? If yes, should we fix them one by one or fix them in the group by operator?

        Show
        He Yongqiang added a comment - the patch looks good. is there the same problem in other udafs? If yes, should we fix them one by one or fix them in the group by operator?
        Hide
        He Yongqiang added a comment -

        sorry, did not see the previous comments. John and Zheng have already discussed this problem. I will start running tests.

        Show
        He Yongqiang added a comment - sorry, did not see the previous comments. John and Zheng have already discussed this problem. I will start running tests.
        Hide
        He Yongqiang added a comment -

        I just committed! Thanks Ning!

        Show
        He Yongqiang added a comment - I just committed! Thanks Ning!
        Hide
        John Sichi added a comment -

        This patch only did (1), not (3). I think we'll still need a followup to avoid the problem for arbitrary UDAF's (unless we require them to avoid primitive types).

        Show
        John Sichi added a comment - This patch only did (1), not (3). I think we'll still need a followup to avoid the problem for arbitrary UDAF's (unless we require them to avoid primitive types).

          People

          • Assignee:
            Ning Zhang
            Reporter:
            Mayank Lahiri
          • Votes:
            3 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development