Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40362

Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.1
    • SQL

    Description

      In the canonicalization code which is now in two stages, canonicalization involving Commutative operators is broken, if they are subexpressions of certain type of expressions which override precanonicalize, for example BinaryComparison 

      Consider following expression:

      a + b > 10

               GT

                  |

      a + b          10

      The BinaryComparison operator in the precanonicalize, first precanonicalizes  children & then   may swap operands based on left /right hashCode inequality..

      lets say  Add(a + b) .hashCode is >  10.hashCode as a result GT is converted to LT

      But If the same tree is created 

                 GT

                  |

       b + a      10

      The hashCode of Add(b, a) is not same as Add(a, b) , thus it is possible that for this tree

       Add(b + a) .hashCode is <  10.hashCode  in which case GT remains as is.

      Thus to similar trees result in different canonicalization , one having GT other having LT 

       

      The problem occurs because  for commutative expressions the canonicalization normalizes the expression with consistent hashCode which is not the case with precanonicalize as the hashCode of commutative expression 's precanonicalize and post canonicalize are different.

       

       

      The test 

      test("bug X")
      Unknown macro: {     val tr1 = LocalRelation('c.int, 'b.string, 'a.int)    val y = tr1.where('a.attr + 'c.attr > 10).analyze    val fullCond = y.asInstanceOf[Filter].condition.clone()   val addExpr = (fullCond match Unknown macro}
      ).clone().asInstanceOf[Add]
      val canonicalizedFullCond = fullCond.canonicalized

      // swap the operands of add
      val newAddExpr = Add(addExpr.right, addExpr.left)

      // build a new condition which is same as the previous one, but with operands of //Add reversed 

      val builtCondnCanonicalized = GreaterThan(newAddExpr, Literal(10)).canonicalized

      assertEquals(canonicalizedFullCond, builtCondnCanonicalized)
      }

      This test fails.

      The fix which I propose is that for the commutative expressions, the precanonicalize should be overridden and  Canonicalize.reorderCommutativeOperators be invoked on the expression instead of at place of canonicalize. effectively for commutative operands ( add, or , multiply , and etc) canonicalize and precanonicalize should be same.

      PR:

      https://github.com/apache/spark/pull/37824

       

       

      I am also trying a better fix, where by the idea is that for commutative expressions the murmur hashCode are caluculated using unorderedHash so that it is order  independent ( i.e symmetric).

      The above approach works fine , but in case of Least & Greatest, the Product's element is  a Seq,  and that messes with consistency of hashCode.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            petertoth Peter Toth
            ashahid7 Asif
            Wenchen Fan Wenchen Fan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 72h
                72h
                Remaining:
                Remaining Estimate - 72h
                72h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Slack

                  Issue deployment