Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24385

Trivially-true EqualNullSafe should be handled like EqualTo in Dataset.join

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.1, 2.3.0
    • Fix Version/s: 2.3.2, 2.4.0
    • Component/s: SQL
    • Labels:
      None

      Description

      Dataset.join(right: Dataset[_], joinExprs: Column, joinType: String) has special logic for resolving trivially-true predicates to both sides. It currently handles regular equals but not null-safe equals; the code should be updated to also handle null-safe equals.

      Pyspark example:

      df = spark.range(10)
      df.join(df, 'id').collect() # This works.
      df.join(df, df['id'] == df['id']).collect() # This works.
      df.join(df, df['id'].eqNullSafe(df['id'])).collect() # This fails!!!
      
      # This is a workaround that works.
      df2 = df.withColumn('id', F.col('id'))
      df.join(df2, df['id'].eqNullSafe(df2['id'])).collect()

      The relevant code in Dataset.join should look like this:

      // Otherwise, find the trivially true predicates and automatically resolves them to both sides.
      // By the time we get here, since we have already run analysis, all attributes should've been
      // resolved and become AttributeReference.
      val cond = plan.condition.map { _.transform {
        case catalyst.expressions.EqualTo(a: AttributeReference, b: AttributeReference) if a.sameRef(b) =>
          catalyst.expressions.EqualTo(
            withPlan(plan.left).resolve(a.name),
            withPlan(plan.right).resolve(b.name))
        // This case is new!!!
        case catalyst.expressions.EqualNullSafe(a: AttributeReference, b: AttributeReference) if a.sameRef(b) =>
          catalyst.expressions.EqualNullSafe(
            withPlan(plan.left).resolve(a.name),
            withPlan(plan.right).resolve(b.name))
      }}
      

        Attachments

          Activity

            People

            • Assignee:
              mgaido Marco Gaido
              Reporter:
              manus Daniel Shields
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: