Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33246

Spark SQL null semantics documentation is incorrect

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Documentation
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 3.0.2, 3.1.0
    • Fix Version/s: 3.0.2, 3.1.0
    • Component/s: Documentation, SQL
    • Labels:
      None

      Description

      The documentation of Spark SQL's null semantics is (I believe) incorrect.

      The documentation states that "NULL AND False" yields NULL, when in fact it yields False.

      Seq[(java.lang.Boolean, java.lang.Boolean)](
        (true, null),
        (false, null),
        (null, true),
        (null, false),
        (null, null)
      )
        .toDF("left_operand", "right_operand")
        .withColumn("OR", 'left_operand || 'right_operand)
        .withColumn("AND", 'left_operand && 'right_operand)
        .show(truncate = false)
      
      +------------+-------------+----+-----+
      |left_operand|right_operand|OR  |AND  |
      +------------+-------------+----+-----+
      |true        |null         |true|null |
      |false       |null         |null|false|
      |null        |true         |true|null |
      |null        |false        |null|false|  <---- this line is incorrect in the docs
      |null        |null         |null|null |
      +------------+-------------+----+-----+
      

        Attachments

          Activity

            People

            • Assignee:
              stwhit Stuart White
              Reporter:
              stwhit Stuart White

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment