XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.9
    • Fix Version/s: 3.0
    • Component/s: spark
    • Labels:
      None
    • Ignite Flags:
      Docs Required, Release Notes Required

      Description

      Also, in Spark was fixed bug with incorrect null handling on columns in codition

      https://issues.apache.org/jira/browse/SPARK-21479

      It leads to IgniteOptimizationJoinSpec fixes (the same thing was in the previous migration from 2.2 to 2.3)

       

      Also, in Spark was fixed bug with incorrect null handling on columns in codition

      https://issues.apache.org/jira/browse/SPARK-21479

      It leads to IgniteOptimizationJoinSpec fixes (the same thing was in the previous migration from 2.2 to 2.3)

       

       

      I made experiment with Spark code for version 2.3 it generates the next plan

      == Parsed Logical Plan ==
      'Project 'jt1.id AS id1#28, 'jt1.val1, 'jt2.id AS id2#29, 'jt2.val2
      +- 'Join LeftOuter, ('jt1.val1 = 'jt2.val2)
      :- 'UnresolvedRelation `jt1`
      +- 'UnresolvedRelation `jt2`

      == Analyzed Logical Plan ==
      id1: string, val1: string, id2: string, val2: string
      Project id#10 AS id1#28, val1#11, id#24 AS id2#29, val2#25
      +- Join LeftOuter, (val1#11 = val2#25)
      :- SubqueryAlias jt1
      : +- Relationid#10,val1#11 csv
      +- SubqueryAlias jt2
      +- Relationid#24,val2#25 csv

      == Optimized Logical Plan ==
      Project id#10 AS id1#28, val1#11, id#24 AS id2#29, val2#25
      +- Join LeftOuter, (val1#11 = val2#25)
      :- Relationid#10,val1#11 csv
      +- Relationid#24,val2#25 csv

       

      The 2.4 generates

      == Parsed Logical Plan ==
      'Project 'jt1.id AS id1#28, 'jt1.val1, 'jt2.id AS id2#29, 'jt2.val2
      +- 'Join LeftOuter, ('jt1.val1 = 'jt2.val2)
      :- 'UnresolvedRelation `jt1`
      +- 'UnresolvedRelation `jt2`

      == Analyzed Logical Plan ==
      id1: string, val1: string, id2: string, val2: string
      Project id#10 AS id1#28, val1#11, id#24 AS id2#29, val2#25
      +- Join LeftOuter, (val1#11 = val2#25)
      :- SubqueryAlias `jt1`
      : +- Relationid#10,val1#11 csv
      +- SubqueryAlias `jt2`
      +- Relationid#24,val2#25 csv

      == Optimized Logical Plan ==
      Project id#10 AS id1#28, val1#11, id#24 AS id2#29, val2#25
      +- Join LeftOuter, (val1#11 = val2#25)
      :- Relationid#10,val1#11 csv
      +- Filter isnotnull(val2#25)
      +- Relationid#24,val2#25 csv

       

      The +- Filter isnotnull(val2#25) is added in optimized logical plan

      But in reality it doesn't work properly (and doesn't filter in Spark), but wow! It works for Ignite (because we honestly work with Spark plan)

       

      If you enable next option 

      .config("ignite.disableSparkSQLOptimization", "true") - the behaviour will be the same in Ignite and Spark and will not add the filter

       

      The best approach for Spark 2.4 - disableSparkOptimization before fixing on Spark side (I could create a bug for this)

        Attachments

          Activity

            People

            • Assignee:
              zaleslaw Alexey Zinoviev
              Reporter:
              zaleslaw Alexey Zinoviev
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: