Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9999 Dataset API on top of Catalyst/DataFrame
  3. SPARK-11894

Incorrect results are returned when using null

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.6.0
    • SQL
    • None

    Description

      In DataSet APIs, the following two datasets are the same.
      Seq((new java.lang.Integer(0), "1"), (new java.lang.Integer(22), "2")).toDS()
      Seq((null.asInstanceOf[java.lang.Integer],, "1"), (new java.lang.Integer(22), "2")).toDS()

      Note: java.lang.Integer is Nullable.

      It could generate an incorrect result. For example,

      val ds1 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new java.lang.Integer(22), "2")).toDS()
      val ds2 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new java.lang.Integer(22), "2")).toDS()//toDF("key", "value").as('df2)

      val res1 = ds1.joinWith(ds2, lit(true)).collect()

      The expected result should be
      ((null,1),(null,1))
      ((22,2),(null,1))
      ((null,1),(22,2))
      ((22,2),(22,2))

      The actual result is
      ((0,1),(0,1))
      ((22,2),(0,1))
      ((0,1),(22,2))
      ((22,2),(22,2))

      Attachments

        Activity

          People

            cloud_fan Wenchen Fan
            smilegator Xiao Li
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: