Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1959

String "NULL" is interpreted as null value

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.1, 1.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      The HiveTableScan operator unwraps string "NULL" (case insensitive) into null values even if the column type is STRING.

      To reproduce the bug, we use sql/hive/src/test/resources/groupby_groupingid.txt as test input, copied to /tmp/groupby_groupingid.txt.

      Hive session:

      hive> CREATE TABLE test_null(key INT, value STRING);
      hive> LOAD DATA LOCAL INPATH '/tmp/groupby_groupingid.txt' INTO table test_null;
      hive> SELECT * FROM test_null WHERE value IS NOT NULL;
      ...
      OK
      1       NULL
      1       1
      2       2
      3       3
      3       NULL
      4       5
      

      We can see that the NULL cells in the original input file are interpreted as string "NULL" in Hive.

      Spark SQL session (sbt/sbt hive/console):

      scala> hql("CREATE TABLE test_null(key INT, value STRING)")
      scala> hql("LOAD DATA LOCAL INPATH '/tmp/groupby_groupingid.txt' INTO table test_null")
      scala> hql("SELECT * FROM test_null WHERE value IS NOT NULL").foreach(println)
      ...
      [1,1]
      [2,2]
      [3,3]
      [4,5]
      

      As we can see, string "NULL" is interpreted as null values in Spark SQL.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lian cheng Cheng Lian
                Reporter:
                lian cheng Cheng Lian
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: