Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24385

Trivially-true EqualNullSafe should be handled like EqualTo in Dataset.join

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.1, 2.3.0
    • 2.3.2, 2.4.0
    • SQL
    • None

    Description

      Dataset.join(right: Dataset[_], joinExprs: Column, joinType: String) has special logic for resolving trivially-true predicates to both sides. It currently handles regular equals but not null-safe equals; the code should be updated to also handle null-safe equals.

      Pyspark example:

      df = spark.range(10)
      df.join(df, 'id').collect() # This works.
      df.join(df, df['id'] == df['id']).collect() # This works.
      df.join(df, df['id'].eqNullSafe(df['id'])).collect() # This fails!!!
      
      # This is a workaround that works.
      df2 = df.withColumn('id', F.col('id'))
      df.join(df2, df['id'].eqNullSafe(df2['id'])).collect()

      The relevant code in Dataset.join should look like this:

      // Otherwise, find the trivially true predicates and automatically resolves them to both sides.
      // By the time we get here, since we have already run analysis, all attributes should've been
      // resolved and become AttributeReference.
      val cond = plan.condition.map { _.transform {
        case catalyst.expressions.EqualTo(a: AttributeReference, b: AttributeReference) if a.sameRef(b) =>
          catalyst.expressions.EqualTo(
            withPlan(plan.left).resolve(a.name),
            withPlan(plan.right).resolve(b.name))
        // This case is new!!!
        case catalyst.expressions.EqualNullSafe(a: AttributeReference, b: AttributeReference) if a.sameRef(b) =>
          catalyst.expressions.EqualNullSafe(
            withPlan(plan.left).resolve(a.name),
            withPlan(plan.right).resolve(b.name))
      }}
      

      Attachments

        Activity

          People

            mgaido Marco Gaido
            manus Daniel Shields
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: