Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1770

Druid adapter: CAST(NULL AS ...) gives NPE

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.13.0
    • Component/s: druid
    • Labels:
      None

      Description

      Following query fails with NPE

       SELECT product_id from foodmart where product_id = NULL group by product_id 

        Activity

        Hide
        bslim slim bouguerra added a comment -
        Show
        bslim slim bouguerra added a comment - yet another "small fix" https://github.com/apache/calcite/pull/441
        Hide
        julianhyde Julian Hyde added a comment -

        2 checkstyle violations, and I expect that maven-dependency-plugin will complain because we don't explicitly include commons-lang.

        ObjectUtils.toString is a small method but its behavior is not obvious. I would just inline it and skip the use of commons-lang. What do you think.

        Show
        julianhyde Julian Hyde added a comment - 2 checkstyle violations, and I expect that maven-dependency-plugin will complain because we don't explicitly include commons-lang. ObjectUtils.toString is a small method but its behavior is not obvious. I would just inline it and skip the use of commons-lang. What do you think.
        Hide
        bslim slim bouguerra added a comment -

        Thanks for the suggestion, agree, updated the PR.

        Show
        bslim slim bouguerra added a comment - Thanks for the suggestion, agree, updated the PR.
        Hide
        julianhyde Julian Hyde added a comment -
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/ef00738a . Thanks for the PR, slim bouguerra !
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        slim bouguerra, Julian Hyde, I understand that Druid does not support NULL and represents them as empty values. But pushing a Filter with a NULL converted into an empty String does not seem to fix this mismatch, but rather produce wrong results.

        IMO failing in such cases or at least using a configuration option to enable this kind of behavior is a better option.

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - slim bouguerra , Julian Hyde , I understand that Druid does not support NULL and represents them as empty values. But pushing a Filter with a NULL converted into an empty String does not seem to fix this mismatch, but rather produce wrong results. IMO failing in such cases or at least using a configuration option to enable this kind of behavior is a better option.
        Hide
        bslim slim bouguerra added a comment -

        Not sure i am getting your point here. If the data storage (druid) does store null as empty, then i am not sure what is the difference between having the filter done by druid or done let say by other processing engine ?
        What i am trying to say if you can not make the distinction within the storage layer, whatever you do in the filter will not change the outcome.

        Show
        bslim slim bouguerra added a comment - Not sure i am getting your point here. If the data storage (druid) does store null as empty, then i am not sure what is the difference between having the filter done by druid or done let say by other processing engine ? What i am trying to say if you can not make the distinction within the storage layer, whatever you do in the filter will not change the outcome.
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        slim bouguerra, that is correct. However we should adhere to the SQL standard and return what the user is asking for in the query, otherwise we might be creating an even bigger deficit.

        To give an example, consider the filter that you specified in the test query:
        \"product_id\" = cast(NULL as varchar)
        That filter evaluates always to false in SQL, since comparing a column to NULL using the = operator returns UNKNOWN. However, with this fix, the filter is evaluated to true for those records for which \"product_id\" is empty. In another setting, if constant folding would kick in and fold the expression to false, we will return different results for otherwise equivalent queries.

        IMO, if there is a limitation in the storage engine, we should expose this to the end user instead of making an assumption on the query, as maybe the user does not even know about this limitation in the first place. Or at least, as an alternative, this should be a configurable option.

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - slim bouguerra , that is correct. However we should adhere to the SQL standard and return what the user is asking for in the query, otherwise we might be creating an even bigger deficit. To give an example, consider the filter that you specified in the test query: \"product_id\" = cast(NULL as varchar) That filter evaluates always to false in SQL, since comparing a column to NULL using the = operator returns UNKNOWN. However, with this fix, the filter is evaluated to true for those records for which \"product_id\" is empty. In another setting, if constant folding would kick in and fold the expression to false , we will return different results for otherwise equivalent queries. IMO, if there is a limitation in the storage engine, we should expose this to the end user instead of making an assumption on the query, as maybe the user does not even know about this limitation in the first place. Or at least, as an alternative, this should be a configurable option.
        Hide
        bslim slim bouguerra added a comment -

        Thanks i see what your are saying, it is true that we should stick as much we can to the standard but i guess we still need to have a handling for cases where the parser will not be able to fold the expressions. For instance suppose we have a black box udf that will return null, not sure what is the way to deal with that ?

        Show
        bslim slim bouguerra added a comment - Thanks i see what your are saying, it is true that we should stick as much we can to the standard but i guess we still need to have a handling for cases where the parser will not be able to fold the expressions. For instance suppose we have a black box udf that will return null, not sure what is the way to deal with that ?
        Hide
        bslim slim bouguerra added a comment -

        Also please keep in mind that this fix did is not responsible of pushing cast over null, the issue your are pointing is due to the fact we allow any cast to varchar.

        Show
        bslim slim bouguerra added a comment - Also please keep in mind that this fix did is not responsible of pushing cast over null, the issue your are pointing is due to the fact we allow any cast to varchar.
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited

        Nullability of functions is taken care of by Calcite type system: the type of the value returned by a function can be 'nullable' or 'not nullable', and the logic to fold expressions uses that information to remain correct.

        Concerning the CAST, \"product_id\" = NULL seems to face the same issue (return records with empty product_id instead of no records).

        Finally, for this specific issue issue, I agree that we needed to fix the NPE. However, as Druid does not support SQL null semantics, I think the correct way to handle this case would have been to bail out from pushing filters/expressions that contain NULL literals, just as we do for functions that handle null values such as IS NULL or IS NOT NULL. What do you think?

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited Nullability of functions is taken care of by Calcite type system: the type of the value returned by a function can be 'nullable' or 'not nullable', and the logic to fold expressions uses that information to remain correct. Concerning the CAST, \"product_id\" = NULL seems to face the same issue (return records with empty product_id instead of no records). Finally, for this specific issue issue, I agree that we needed to fix the NPE. However, as Druid does not support SQL null semantics, I think the correct way to handle this case would have been to bail out from pushing filters/expressions that contain NULL literals, just as we do for functions that handle null values such as IS NULL or IS NOT NULL. What do you think?
        Hide
        bslim slim bouguerra added a comment -

        What is the sql semantic for filter like product_id = cast('not_not_a_number' as integer) ?

        Show
        bslim slim bouguerra added a comment - What is the sql semantic for filter like product_id = cast('not_not_a_number' as integer) ?
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited

        My guess is that Calcite will throw a parsing error (for instance, that is what PostgresSQL does). Otherwise, other systems may throw a runtime error. I am not sure what the standard specifies in this case when CAST is not valid (I could check later on to see if it is specified) but those are standard responses by most RDBMS.

        Edit:
        From the standard:

        If SV does not comprise a <signed numeric literal> as defined by the rules for <literal> in Subclause 5.3, “<literal>”, then an exception condition is raised: data exception — invalid character value for cast.

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited My guess is that Calcite will throw a parsing error (for instance, that is what PostgresSQL does). Otherwise, other systems may throw a runtime error. I am not sure what the standard specifies in this case when CAST is not valid (I could check later on to see if it is specified) but those are standard responses by most RDBMS. Edit: From the standard: If SV does not comprise a <signed numeric literal> as defined by the rules for <literal> in Subclause 5.3, “<literal>”, then an exception condition is raised: data exception — invalid character value for cast.
        Hide
        bslim slim bouguerra added a comment -

        Ok thanks, seems like hive does not honer that and sends in a null rel node.

        Show
        bslim slim bouguerra added a comment - Ok thanks, seems like hive does not honer that and sends in a null rel node.
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        Yes, it does not surprise me... In any case, there is an ongoing effort to close those gaps, we can handle that in a Hive issue.

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Yes, it does not surprise me... In any case, there is an ongoing effort to close those gaps, we can handle that in a Hive issue.
        Hide
        julianhyde Julian Hyde added a comment - - edited

        Regarding treatment of nulls. I totally agree with Jesus. Let's treat nulls strictly in Calcite (and Hive) and deliver SQL semantics despite Druid's different behavior. Who knows, one day Druid may be able to store nulls and will implement 3-valued logic.

        I believe that this fix is consistent with SQL semantics (i.e. the query will return 0 rows because product_id = null always yields UNKNOWN). If I'm wrong let's re-open the case.

        By the way, CALCITE-815 would add an option that treats empty strings as null values (basically, converts them to null values on the fly) but still retains 3-valued logic. Thus '' = 'a' returns UNKNOWN. It even affects type derivation: the return type of SUBSTRING(x FROM y FOR z) is VARCHAR(n) NOT NULL, but with emptyStringIsNull=true, its type is VARCHAR(n) because it can return an empty string which would become NULL.

        Show
        julianhyde Julian Hyde added a comment - - edited Regarding treatment of nulls. I totally agree with Jesus. Let's treat nulls strictly in Calcite (and Hive) and deliver SQL semantics despite Druid's different behavior. Who knows, one day Druid may be able to store nulls and will implement 3-valued logic. I believe that this fix is consistent with SQL semantics (i.e. the query will return 0 rows because product_id = null always yields UNKNOWN ). If I'm wrong let's re-open the case. By the way, CALCITE-815 would add an option that treats empty strings as null values (basically, converts them to null values on the fly) but still retains 3-valued logic. Thus '' = 'a' returns UNKNOWN. It even affects type derivation: the return type of SUBSTRING(x FROM y FOR z) is VARCHAR(n) NOT NULL , but with emptyStringIsNull=true , its type is VARCHAR(n) because it can return an empty string which would become NULL.
        Show
        bslim slim bouguerra added a comment - https://github.com/apache/calcite/pull/448
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        New fix that avoids pushing NULL literal values in http://git-wip-us.apache.org/repos/asf/calcite/commit/ea994131c . Thanks slim bouguerra!

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - New fix that avoids pushing NULL literal values in http://git-wip-us.apache.org/repos/asf/calcite/commit/ea994131c . Thanks slim bouguerra !
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        Resolved in release 1.13.0 (2017-06-26).

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.13.0 (2017-06-26).

          People

          • Assignee:
            bslim slim bouguerra
            Reporter:
            bslim slim bouguerra
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development