Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37369

Avoid redundant ColumnarToRow transistion on InMemoryTableScan

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • SQL
    • None

    Description

      We have a rule to insert columnar transition between row-based and columnar query plans. InMemoryTableScanExec can produce columnar output. So if its parent plan isn't columnar, the rule adds a ColumnarToRow between them.

      But InMemoryTableScanExec is a special query plan because it can convert from cached batch to columnar batch or row.

      For such case, we ask InMemoryTableScanExec to convert cached batch to columnar batch, and then convert to row in the added ColumnarToRow, before the parent query.

      So for such case, we can simply ask InMemoryTableScanExec to produce row output instead of a redundant conversion.

      ```
      +- Union
      :- ColumnarToRow
      : +- InMemoryTableScan i#8, j#9
      : +- InMemoryRelation i#8, j#9, StorageLevel(disk, memory, deserialized, 1 replicas)
      ```

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            viirya L. C. Hsieh
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: