Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26864

Query may return incorrect result when python udf is used as a join condition and the udf uses attributes from both legs of left semi join.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.4.1, 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      In SPARK-25314, we supported the scenario of having a python UDF that refers to attributes from both legs of a join condition by rewriting
      the plan to convert an inner join or left semi join to a filter over a cross join. In case of left semi join, this transformation may
      cause incorrect results when the right leg of join condition produces duplicate rows based on the join condition.

        Attachments

          Activity

            People

            • Assignee:
              dkbiswal Dilip Biswal
              Reporter:
              dkbiswal Dilip Biswal
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: