Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29335

Cost Based Optimizer stats are not used while evaluating query plans in Spark Sql

    XMLWordPrintableJSON

    Details

    • Type: Question
    • Status: Resolved
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: Optimizer, SQL
    • Environment:

      We tried to execute the same using Spark-sql and Thrify server using SQLWorkbench but we are not able to use the stats.

      Description

      We are trying to leverage CBO for getting better plan results for few critical queries run thru spark-sql or thru thrift server using jdbc driver. 

      Following settings added to spark-defaults.conf

      spark.sql.cbo.enabled true
      spark.experimental.extrastrategies intervaljoin
      spark.sql.cbo.joinreorder.enabled true
      

       

      The tables that we are using are not partitioned.

      spark-sql> analyze table arrow.t_fperiods_sundar compute statistics ;
      analyze table arrow.t_fperiods_sundar compute statistics for columns eid, year, ptype, absref, fpid , pid ;
      analyze table arrow.t_fdata_sundar compute statistics ;
      analyze table arrow.t_fdata_sundar compute statistics for columns eid, fpid, absref;
      

      Analyze completed success fully.

      Describe extended , does not show column level stats data and queries are not leveraging table or column level stats .

      we are using Oracle as our Hive Catalog store and not Glue .

      When we are using spark sql and running queries we are not able to see the stats in use in the explain plan and we are not sure if cbo is put to use.

      A quick response would be helpful.

      Explain Plan:

      Following Explain command does not reference to any Statistics usage.

      spark-sql> *explain extended select a13.imnem,a13.fvalue,a12.ptype,a13.absref from arrow.t_fperiods_sundar a12, arrow.t_fdata_sundar a13 where a12.eid = a13.eid and a12.fpid = a13.fpid and a13.absref = 'Y2017' and a12.year = 2017 and a12.ptype = 'A' and a12.eid = 29940 and a12.PID is NULL ;*
       
      19/09/05 14:15:15 INFO FileSourceStrategy: Pruning directories with:
      19/09/05 14:15:15 INFO FileSourceStrategy: Post-Scan Filters: isnotnull(ptype#4546),isnotnull(year#4545),isnotnull(eid#4542),(year#4545 = 2017),(ptype#4546 = A),(eid#4542 = 29940),isnull(PID#4527),isnotnull(fpid#4523)
      19/09/05 14:15:15 INFO FileSourceStrategy: Output Data Schema: struct<FPID: decimal(38,0), PID: string, EID: decimal(10,0), YEAR: int, PTYPE: string ... 3 more fields>
      19/09/05 14:15:15 INFO FileSourceScanExec: Pushed Filters: IsNotNull(PTYPE),IsNotNull(YEAR),IsNotNull(EID),EqualTo(YEAR,2017),EqualTo(PTYPE,A),EqualTo(EID,29940),IsNull(PID),IsNotNull(FPID)
      19/09/05 14:15:15 INFO FileSourceStrategy: Pruning directories with:
      19/09/05 14:15:15 INFO FileSourceStrategy: Post-Scan Filters: isnotnull(absref#4569),(absref#4569 = Y2017),isnotnull(fpid#4567),isnotnull(eid#4566),(eid#4566 = 29940)
      19/09/05 14:15:15 INFO FileSourceStrategy: Output Data Schema: struct<IMNEM: string, FVALUE: string, EID: decimal(10,0), FPID: decimal(10,0), ABSREF: string ... 3 more fields>
      19/09/05 14:15:15 INFO FileSourceScanExec: Pushed Filters: IsNotNull(ABSREF),EqualTo(ABSREF,Y2017),IsNotNull(FPID),IsNotNull(EID),EqualTo(EID,29940)
      == Parsed Logical Plan ==
      'Project ['a13.imnem, 'a13.fvalue, 'a12.ptype, 'a13.absref]
      +- 'Filter (((('a12.eid = 'a13.eid) && ('a12.fpid = 'a13.fpid)) && (('a13.absref = Y2017) && ('a12.year = 2017))) && ((('a12.ptype = A) && ('a12.eid = 29940)) && isnull('a12.PID)))
       +- 'Join Inner
       :- 'SubqueryAlias a12
       : +- 'UnresolvedRelation `arrow`.`t_fperiods_sundar`
       +- 'SubqueryAlias a13
       +- 'UnresolvedRelation `arrow`.`t_fdata_sundar`
       
      == Analyzed Logical Plan ==
      imnem: string, fvalue: string, ptype: string, absref: string
      Project [imnem#4548, fvalue#4552, ptype#4546, absref#4569]
      +- Filter ((((eid#4542 = eid#4566) && (cast(fpid#4523 as decimal(38,0)) = cast(fpid#4567 as decimal(38,0)))) && ((absref#4569 = Y2017) && (year#4545 = 2017))) && (((ptype#4546 = A) && (cast(eid#4542 as decimal(10,0)) = cast(cast(29940 as decimal(5,0)) as decimal(10,0)))) && isnull(PID#4527)))
       +- Join Inner
       :- SubqueryAlias a12
       : +- SubqueryAlias t_fperiods_sundar
       : +- Relation[FPID#4523,QTR#4524,ABSREF#4525,DSET#4526,PID#4527,NID#4528,CCD#4529,LOCKDATE#4530,LOCKUSER#4531,UPDUSER#4532,UPDDATE#4533,RESTATED#4534,DATADATE#4535,DATASTATE#4536,ACCSTD#4537,INDFMT#4538,DATAFMT#4539,CONSOL#4540,FYR#4541,EID#4542,FPSTATE#4543,BATCH_ID#4544L,YEAR#4545,PTYPE#4546] parquet
       +- SubqueryAlias a13
       +- SubqueryAlias t_fdata_sundar
       +- Relation[FDID#4547,IMNEM#4548,DSET#4549,DSRS#4550,PID#4551,FVALUE#4552,FCOMP#4553,CONF#4554,UPDDATE#4555,UPDUSER#4556,SEQ#4557,CCD#4558,NID#4559,CONVTYPE#4560,DATADATE#4561,DCOMMENT#4562,UDOMAIN#4563,FPDID#4564L,VLP_CALC#4565,EID#4566,FPID#4567,BATCH_ID#4568L,ABSREF#4569] parquet
       
      == Optimized Logical Plan ==
      Project [imnem#4548, fvalue#4552, ptype#4546, absref#4569]
      +- Join Inner, ((eid#4542 = eid#4566) && (fpid#4523 = cast(fpid#4567 as decimal(38,0))))
       :- Project [FPID#4523, EID#4542, PTYPE#4546]
       : +- Filter (((((((isnotnull(ptype#4546) && isnotnull(year#4545)) && isnotnull(eid#4542)) && (year#4545 = 2017)) && (ptype#4546 = A)) && (eid#4542 = 29940)) && isnull(PID#4527)) && isnotnull(fpid#4523))
       : +- Relation[FPID#4523,QTR#4524,ABSREF#4525,DSET#4526,PID#4527,NID#4528,CCD#4529,LOCKDATE#4530,LOCKUSER#4531,UPDUSER#4532,UPDDATE#4533,RESTATED#4534,DATADATE#4535,DATASTATE#4536,ACCSTD#4537,INDFMT#4538,DATAFMT#4539,CONSOL#4540,FYR#4541,EID#4542,FPSTATE#4543,BATCH_ID#4544L,YEAR#4545,PTYPE#4546] parquet
       +- Project [IMNEM#4548, FVALUE#4552, EID#4566, FPID#4567, ABSREF#4569]
       +- Filter ((((isnotnull(absref#4569) && (absref#4569 = Y2017)) && isnotnull(fpid#4567)) && isnotnull(eid#4566)) && (eid#4566 = 29940))
       +- Relation[FDID#4547,IMNEM#4548,DSET#4549,DSRS#4550,PID#4551,FVALUE#4552,FCOMP#4553,CONF#4554,UPDDATE#4555,UPDUSER#4556,SEQ#4557,CCD#4558,NID#4559,CONVTYPE#4560,DATADATE#4561,DCOMMENT#4562,UDOMAIN#4563,FPDID#4564L,VLP_CALC#4565,EID#4566,FPID#4567,BATCH_ID#4568L,ABSREF#4569] parquet
       
      == Physical Plan ==
      *(2) Project [imnem#4548, fvalue#4552, ptype#4546, absref#4569]
      +- *(2) BroadcastHashJoin [eid#4542, fpid#4523], [eid#4566, cast(fpid#4567 as decimal(38,0))], Inner, BuildLeft
       :- BroadcastExchange HashedRelationBroadcastMode(List(input[1, decimal(10,0), true], input[0, decimal(38,0), true]))
       : +- *(1) Project [FPID#4523, EID#4542, PTYPE#4546]
       : +- *(1) Filter (((((((isnotnull(ptype#4546) && isnotnull(year#4545)) && isnotnull(eid#4542)) && (year#4545 = 2017)) && (ptype#4546 = A)) && (eid#4542 = 29940)) && isnull(PID#4527)) && isnotnull(fpid#4523))
       : +- *(1) FileScan parquet arrow.t_fperiods_sundar[FPID#4523,PID#4527,EID#4542,YEAR#4545,PTYPE#4546] Batched: true, Format: Parquet, Location: InMemoryFileIndex[s3://spr-idf-uat-processed/replication/base/APPLICATION_SCHEMA/ARROW/FPERIODS_S..., PartitionFilters: [], PushedFilters: [IsNotNull(PTYPE), IsNotNull(YEAR), IsNotNull(EID), EqualTo(YEAR,2017), EqualTo(PTYPE,A), EqualTo..., ReadSchema: struct<FPID:decimal(38,0),PID:string,EID:decimal(10,0),YEAR:int,PTYPE:string>
       +- *(2) Project [IMNEM#4548, FVALUE#4552, EID#4566, FPID#4567, ABSREF#4569]
       +- *(2) Filter ((((isnotnull(absref#4569) && (absref#4569 = Y2017)) && isnotnull(fpid#4567)) && isnotnull(eid#4566)) && (eid#4566 = 29940))
       +- *(2) FileScan parquet arrow.t_fdata_sundar[IMNEM#4548,FVALUE#4552,EID#4566,FPID#4567,ABSREF#4569] Batched: true, Format: Parquet, Location: InMemoryFileIndex[s3://spr-idf-uat-processed/replication/base/APPLICATION_SCHEMA/ARROW/FDATA_SUNDAR], PartitionFilters: [], PushedFilters: [IsNotNull(ABSREF), EqualTo(ABSREF,Y2017), IsNotNull(FPID), IsNotNull(EID), EqualTo(EID,29940)], ReadSchema: struct<IMNEM:string,FVALUE:string,EID:decimal(10,0),FPID:decimal(10,0),ABSREF:string>
      Time taken: 0.35 seconds, Fetched 1 row(s)
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              idfspark Srini E
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: