Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12449

Pushing down arbitrary logical plans to data sources

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      With the help of the DataSource API we can pull data from external sources for processing. Implementing interfaces such as PrunedFilteredScan allows to push down filters and projects pruning unnecessary fields and rows directly in the data source.

      However, data sources such as SQL Engines are capable of doing even more preprocessing, e.g., evaluating aggregates. This is beneficial because it would reduce the amount of data transferred from the source to Spark. The existing interfaces do not allow such kind of processing in the source.

      We would propose to add a new interface CatalystSource that allows to defer the processing of arbitrary logical plans to the data source. We have already shown the details at the Spark Summit 2015 Europe https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/

      I will add a design document explaining details.

        Attachments

        1. pushingDownLogicalPlans.pdf
          181 kB
          Stephan Kessler

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                stephank85 Stephan Kessler
              • Votes:
                34 Vote for this issue
                Watchers:
                64 Start watching this issue

                Dates

                • Created:
                  Updated: