[SPARK-1959] String "NULL" is interpreted as null value - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: 1.0.1, 1.1.0
Component/s: SQL
Labels:
None

Description

The HiveTableScan operator unwraps string "NULL" (case insensitive) into null values even if the column type is STRING.

To reproduce the bug, we use sql/hive/src/test/resources/groupby_groupingid.txt as test input, copied to /tmp/groupby_groupingid.txt.

Hive session:

hive> CREATE TABLE test_null(key INT, value STRING);
hive> LOAD DATA LOCAL INPATH '/tmp/groupby_groupingid.txt' INTO table test_null;
hive> SELECT * FROM test_null WHERE value IS NOT NULL;
...
OK
1       NULL
1       1
2       2
3       3
3       NULL
4       5

We can see that the NULL cells in the original input file are interpreted as string "NULL" in Hive.

Spark SQL session (sbt/sbt hive/console):

scala> hql("CREATE TABLE test_null(key INT, value STRING)")
scala> hql("LOAD DATA LOCAL INPATH '/tmp/groupby_groupingid.txt' INTO table test_null")
scala> hql("SELECT * FROM test_null WHERE value IS NOT NULL").foreach(println)
...
[1,1]
[2,2]
[3,3]
[4,5]

As we can see, string "NULL" is interpreted as null values in Spark SQL.

Attachments

Issue Links

is related to

SPARK-3683 PySpark Hive query generates "NULL" instead of None

Closed

Activity

People

Assignee:: Cheng Lian

Reporter:: Cheng Lian

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/May/14 03:12

Updated:: 28/Oct/14 22:06

Resolved:: 31/May/14 05:13