Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9999 Dataset API on top of Catalyst/DataFrame
  3. SPARK-11894

Incorrect results are returned when using null

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: 1.6.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      In DataSet APIs, the following two datasets are the same.
      Seq((new java.lang.Integer(0), "1"), (new java.lang.Integer(22), "2")).toDS()
      Seq((null.asInstanceOf[java.lang.Integer],, "1"), (new java.lang.Integer(22), "2")).toDS()

      Note: java.lang.Integer is Nullable.

      It could generate an incorrect result. For example,

      val ds1 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new java.lang.Integer(22), "2")).toDS()
      val ds2 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new java.lang.Integer(22), "2")).toDS()//toDF("key", "value").as('df2)

      val res1 = ds1.joinWith(ds2, lit(true)).collect()

      The expected result should be
      ((null,1),(null,1))
      ((22,2),(null,1))
      ((null,1),(22,2))
      ((22,2),(22,2))

      The actual result is
      ((0,1),(0,1))
      ((22,2),(0,1))
      ((0,1),(22,2))
      ((22,2),(22,2))

        Attachments

          Activity

            People

            • Assignee:
              cloud_fan Wenchen Fan
              Reporter:
              smilegator Xiao Li
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: