Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38868

`assert_true` fails unconditionnaly after `left_outer` joins

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.1, 3.1.2, 3.2.0, 3.2.1, 3.3.0, 3.4.0
    • 3.1.3, 3.3.0, 3.2.2
    • PySpark, SQL
    • None

    Description

      When `assert_true` is used after a `left_outer` join the assert exception is raised even though all the rows meet the condition. Using an `inner` join does not expose this issue.

       

      from pyspark.sql import SparkSession
      from pyspark.sql import functions as sf
      
      session = SparkSession.builder.getOrCreate()
      
      entries = session.createDataFrame(
          [
              ("a", 1),
              ("b", 2),
              ("c", 3),
          ],
          ["id", "outcome_id"],
      )
      
      outcomes = session.createDataFrame(
          [
              (1, 12),
              (2, 34),
              (3, 32),
          ],
          ["outcome_id", "outcome_value"],
      )
      
      # Inner join works as expected
      (
          entries.join(outcomes, on="outcome_id", how="inner")
          .withColumn("valid", sf.assert_true(sf.col("outcome_value") > 10))
          .filter(sf.col("valid").isNull())
          .show()
      )
      
      # Left join fails with «'('outcome_value > 10)' is not true!» even though it is the case
      (
          entries.join(outcomes, on="outcome_id", how="left_outer")
          .withColumn("valid", sf.assert_true(sf.col("outcome_value") > 10))
          .filter(sf.col("valid").isNull())
          .show()
      )

      Reproduced on `pyspark` versions: `3.2.1`, `3.2.0`, `3.1.2` and `3.1.1`. I am not sure if "native" Spark exposes this issue as well or not, I don't have the knowledge/setup to test that.

      Attachments

        Activity

          People

            bersprockets Bruce Robbins
            StreakyCobra Fabien Dubosson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: