[DRILL-2015] Casting numeric value that does not fit in integer data type produces incorrect result - ASF JIRA

Details

Type: Improvement
Status: Open
Priority: Critical
Resolution: Unresolved
Affects Version/s: 0.8.0, 1.2.0
Fix Version/s: Future
Component/s: Execution - Data Types
Labels:
- document_if_not_fixed

Description

t1.json

{ "a1": 1 ,    "b1" : 1}
{ "a1": 2 ,    "b1" : 1}
{ "a1": 2 ,    "b1" : 2}
{ "a1": 3 ,    "b1" : 2}
{ "a1": 5000147483647 , "b1" : 3}

We should throw an error, this is technically data corruption.

0: jdbc:drill:schema=dfs> select cast(a1 as integer) from `t1.json`;
+------------+
|   EXPR$0   |
+------------+
| 1          |
| 2          |
| 2          |
| 3          |
| 805551103  |
+------------+
5 rows selected (0.074 seconds)

0: jdbc:drill:schema=dfs> select cast(2147483648 as integer) from `t1.json`;
+------------+
|   EXPR$0   |
+------------+
| -2147483648 |
| -2147483648 |
| -2147483648 |
| -2147483648 |
| -2147483648 |
+------------+
5 rows selected (0.076 seconds)

Attachments

Issue Links

incorporates

DRILL-3278 SUM(CAST(columns[0] AS BIGINT)) OVER(...) gives wrong results

Closed

Activity

Ascending order - Click to sort in descending order

Jacques Nadeau added a comment - 15/Jan/15 19:31

I think this needs to be looked at in the broader sense. Due we throw exception in the case of truncation or overflow. I know some databases have an option to do both. My inclination is to simply overflow. I'm up for doing this with literals but I don't think the performance impact warrants doing this for all transformations. We could support it as an option but that would be a lot of work and I'm not sure it is an important priority.

Jacques Nadeau added a comment - 15/Jan/15 19:31 I think this needs to be looked at in the broader sense. Due we throw exception in the case of truncation or overflow. I know some databases have an option to do both. My inclination is to simply overflow. I'm up for doing this with literals but I don't think the performance impact warrants doing this for all transformations. We could support it as an option but that would be a lot of work and I'm not sure it is an important priority.

Mehant Baid added a comment - 15/Jan/15 19:35

Most of the aggregate and simple functions don't care about overflow for performance like you mentioned, however there are a few cast functions that check for overflow and throw an exception. We need to align those cases based on what is decided.

Mehant Baid added a comment - 15/Jan/15 19:35 Most of the aggregate and simple functions don't care about overflow for performance like you mentioned, however there are a few cast functions that check for overflow and throw an exception. We need to align those cases based on what is decided.

Victoria Markman added a comment - 15/Jan/15 19:47

The default should be to throw an exception and give customer an option to turn off checking if performance is very critical.
It should be their "choice" if they want to get wrong results. In any case, if we decide not to implement an option for GA, it should be clearly documented.

Victoria Markman added a comment - 15/Jan/15 19:47 The default should be to throw an exception and give customer an option to turn off checking if performance is very critical. It should be their "choice" if they want to get wrong results. In any case, if we decide not to implement an option for GA, it should be clearly documented.

Victoria Markman added a comment - 05/Feb/15 06:21

Example with the simple function:

0: jdbc:drill:schema=dfs> select * from tab;
+------------+
|     a1     |
+------------+
| -3477884857818808320 |
| 0          |
| 0          |
| 4465148082249531392 |
| 4465148082249531392 |
+------------+
5 rows selected (0.057 seconds)

0: jdbc:drill:schema=dfs> select round(a1/13) from tab;
+------------+
|   EXPR$0   |
+------------+
| -2147483648 |
| 0          |
| 0          |
| 2147483647 |
| 2147483647 |
+------------+

Victoria Markman added a comment - 05/Feb/15 06:21 Example with the simple function: 0: jdbc:drill:schema=dfs> select * from tab; +------------+ | a1 | +------------+ | -3477884857818808320 | | 0 | | 0 | | 4465148082249531392 | | 4465148082249531392 | +------------+ 5 rows selected (0.057 seconds) 0: jdbc:drill:schema=dfs> select round(a1/13) from tab; +------------+ | EXPR$0 | +------------+ | -2147483648 | | 0 | | 0 | | 2147483647 | | 2147483647 | +------------+

Kris Hahn added a comment - 25/Aug/15 21:44

Added Unresolved Issues section to Drill 1.0 release notes: http://kristinehahn.github.io/drill/docs/apache-drill-1-0-0-release-notes/

The revision will appear on the Apache site next time we publish the docs.

Kris Hahn added a comment - 25/Aug/15 21:44 Added Unresolved Issues section to Drill 1.0 release notes: http://kristinehahn.github.io/drill/docs/apache-drill-1-0-0-release-notes/ The revision will appear on the Apache site next time we publish the docs.

Victoria Markman added a comment - 08/Oct/15 20:36

Changed priority to critical, since it is incorrect result.

Victoria Markman added a comment - 08/Oct/15 20:36 Changed priority to critical, since it is incorrect result.

People

Assignee:: Unassigned

Reporter:: Victoria Markman

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 15/Jan/15 18:56

Updated:: 08/Oct/15 20:36

Apache Drill