[SPARK-19971] Wired SELECT equal behaviour. - ASF JIRA

XML

Word

Printable

JSON

Let say we have a csv /tmp/1.csv :

cid,name
-100224910923912596,jack
-100224910923912595,tom
-1,rose
-2,marry
-100,rose1
-101,rose2

Use following SQL to define a view in Spark-SQL:

CREATE TEMPORARY VIEW T
(
  `cid` string,
  `name` string
)
USING CSV
OPTIONS (
  path "/tmp/1.csv"
);

Statement 1:

select * from T where cid = -100224910923912596;

Returns:

-100224910923912596	jack
-100224910923912595	tom

Statement 2:

select * from T where cid = -100224910923912599;

it also returns:

-100224910923912596	jack
-100224910923912595	tom

Unless you do,

select * from T where cid = '-100224910923912596';

It returns:

-100224910923912596	jack

However, i think the expected behaviour for statement 1 and 2 is pretty wired.

Statement 4

select * from T where cid = -100;

Returns:

-100 rose1

And this just affect the large number, the smaller one seemed to be good.

Does that look like a bug to you folks ?

Thanks.

duplicates

SPARK-17913 Filter/join expressions can return incorrect results when comparing strings to longs