Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35355

improve execution performance in insert...select...limit case

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      In the case of `insert into...select...limit` , `CollectLimitExec` has better execution performance than `GlobalLimit` .

      Before:

      == Physical Plan ==
       Execute InsertIntoHadoopFsRelationCommand ...
       +- *(2) GlobalLimit 5
       +- Exchange SinglePartition, true, id=#39
       +- *(1) LocalLimit 5
       +- *(1) ColumnarToRow
       +- FileScan ...
      

      After:

      == Physical Plan ==
       Execute InsertIntoHadoopFsRelationCommand ...
       +- CollectLimit 5
       +- *(1) ColumnarToRow
       +- FileScan ....
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              kaifeiYi yikf
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: