[SPARK-18489] Implicit type conversion during comparision between Integer type column and String type column - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

Suppose I have a dataframe with schema:

root
 |-- _c0: integer (nullable = true)
 |-- _c1: double (nullable = true)
 |-- _c2: string (nullable = true)

and data:

+---+---+----+
|_c0|_c1| _c2|
+---+---+----+
|  1|1.0|   1|
|  2|1.0|   s|
|  3|3.1|null|
+---+---+----+

if the following operations are carried out:

df.where("_c1==_c2").show
+---+---+---+
|_c0|_c1|_c2|
+---+---+---+
|  1|1.0|  1|
+---+---+---+

df.where("_c1<>_c2").show   or   df.where("_c1!=_c2").show 
+---+---+---+
|_c0|_c1|_c2|
+---+---+---+
+---+---+---+

So the related operation results are ambiguous
Here the stringified numeric values are being Implicitly casted where the others are just ignored instead of throwing an exception
In my view these things can lead to incorrect results if dataset is not properly observed.

Also SQL-99 standard discourages implicit casting to avoid such things.
https://users.dcc.uchile.cl/~cgutierr/cursos/BD/standards.pdf

The same implicit casting is also there for UDFs and aggregation functions.

Attachments

Issue Links

duplicates

SPARK-17913 Filter/join expressions can return incorrect results when comparing strings to longs

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Bipul Kumar

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 17/Nov/16 11:29

Updated:: 07/Feb/20 17:23

Resolved:: 17/Nov/16 11:50