Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1959

String "NULL" is interpreted as null value

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 1.0.1, 1.1.0
    • SQL
    • None

    Description

      The HiveTableScan operator unwraps string "NULL" (case insensitive) into null values even if the column type is STRING.

      To reproduce the bug, we use sql/hive/src/test/resources/groupby_groupingid.txt as test input, copied to /tmp/groupby_groupingid.txt.

      Hive session:

      hive> CREATE TABLE test_null(key INT, value STRING);
      hive> LOAD DATA LOCAL INPATH '/tmp/groupby_groupingid.txt' INTO table test_null;
      hive> SELECT * FROM test_null WHERE value IS NOT NULL;
      ...
      OK
      1       NULL
      1       1
      2       2
      3       3
      3       NULL
      4       5
      

      We can see that the NULL cells in the original input file are interpreted as string "NULL" in Hive.

      Spark SQL session (sbt/sbt hive/console):

      scala> hql("CREATE TABLE test_null(key INT, value STRING)")
      scala> hql("LOAD DATA LOCAL INPATH '/tmp/groupby_groupingid.txt' INTO table test_null")
      scala> hql("SELECT * FROM test_null WHERE value IS NOT NULL").foreach(println)
      ...
      [1,1]
      [2,2]
      [3,3]
      [4,5]
      

      As we can see, string "NULL" is interpreted as null values in Spark SQL.

      Attachments

        Issue Links

          Activity

            People

              lian cheng Cheng Lian
              lian cheng Cheng Lian
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: