Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2015

Casting numeric value that does not fit in integer data type produces incorrect result

Details

    Description

      t1.json

      { "a1": 1 ,    "b1" : 1}
      { "a1": 2 ,    "b1" : 1}
      { "a1": 2 ,    "b1" : 2}
      { "a1": 3 ,    "b1" : 2}
      { "a1": 5000147483647 , "b1" : 3}
      

      We should throw an error, this is technically data corruption.

      0: jdbc:drill:schema=dfs> select cast(a1 as integer) from `t1.json`;
      +------------+
      |   EXPR$0   |
      +------------+
      | 1          |
      | 2          |
      | 2          |
      | 3          |
      | 805551103  |
      +------------+
      5 rows selected (0.074 seconds)
      
      0: jdbc:drill:schema=dfs> select cast(2147483648 as integer) from `t1.json`;
      +------------+
      |   EXPR$0   |
      +------------+
      | -2147483648 |
      | -2147483648 |
      | -2147483648 |
      | -2147483648 |
      | -2147483648 |
      +------------+
      5 rows selected (0.076 seconds)
      

      Attachments

        Issue Links

          Activity

            jnadeau Jacques Nadeau added a comment -

            I think this needs to be looked at in the broader sense. Due we throw exception in the case of truncation or overflow. I know some databases have an option to do both. My inclination is to simply overflow. I'm up for doing this with literals but I don't think the performance impact warrants doing this for all transformations. We could support it as an option but that would be a lot of work and I'm not sure it is an important priority.

            jnadeau Jacques Nadeau added a comment - I think this needs to be looked at in the broader sense. Due we throw exception in the case of truncation or overflow. I know some databases have an option to do both. My inclination is to simply overflow. I'm up for doing this with literals but I don't think the performance impact warrants doing this for all transformations. We could support it as an option but that would be a lot of work and I'm not sure it is an important priority.
            mehant Mehant Baid added a comment -

            Most of the aggregate and simple functions don't care about overflow for performance like you mentioned, however there are a few cast functions that check for overflow and throw an exception. We need to align those cases based on what is decided.

            mehant Mehant Baid added a comment - Most of the aggregate and simple functions don't care about overflow for performance like you mentioned, however there are a few cast functions that check for overflow and throw an exception. We need to align those cases based on what is decided.

            The default should be to throw an exception and give customer an option to turn off checking if performance is very critical.
            It should be their "choice" if they want to get wrong results. In any case, if we decide not to implement an option for GA, it should be clearly documented.

            vicky Victoria Markman added a comment - The default should be to throw an exception and give customer an option to turn off checking if performance is very critical. It should be their "choice" if they want to get wrong results. In any case, if we decide not to implement an option for GA, it should be clearly documented.

            Example with the simple function:

            0: jdbc:drill:schema=dfs> select * from tab;
            +------------+
            |     a1     |
            +------------+
            | -3477884857818808320 |
            | 0          |
            | 0          |
            | 4465148082249531392 |
            | 4465148082249531392 |
            +------------+
            5 rows selected (0.057 seconds)
            
            0: jdbc:drill:schema=dfs> select round(a1/13) from tab;
            +------------+
            |   EXPR$0   |
            +------------+
            | -2147483648 |
            | 0          |
            | 0          |
            | 2147483647 |
            | 2147483647 |
            +------------+
            
            vicky Victoria Markman added a comment - Example with the simple function: 0: jdbc:drill:schema=dfs> select * from tab; +------------+ | a1 | +------------+ | -3477884857818808320 | | 0 | | 0 | | 4465148082249531392 | | 4465148082249531392 | +------------+ 5 rows selected (0.057 seconds) 0: jdbc:drill:schema=dfs> select round(a1/13) from tab; +------------+ | EXPR$0 | +------------+ | -2147483648 | | 0 | | 0 | | 2147483647 | | 2147483647 | +------------+
            krishahn Kris Hahn added a comment -

            Added Unresolved Issues section to Drill 1.0 release notes: http://kristinehahn.github.io/drill/docs/apache-drill-1-0-0-release-notes/

            The revision will appear on the Apache site next time we publish the docs.

            krishahn Kris Hahn added a comment - Added Unresolved Issues section to Drill 1.0 release notes: http://kristinehahn.github.io/drill/docs/apache-drill-1-0-0-release-notes/ The revision will appear on the Apache site next time we publish the docs.

            Changed priority to critical, since it is incorrect result.

            vicky Victoria Markman added a comment - Changed priority to critical, since it is incorrect result.

            People

              Unassigned Unassigned
              vicky Victoria Markman
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: