Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25531 new write APIs for data source v2
  3. SPARK-23521

SPIP: Standardize SQL logical plans with DataSourceV2

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • None
    • SQL

    Description

      Executive Summary: This SPIP is based on discussion about the DataSourceV2 implementation on the dev list. The proposal is to standardize the logical plans used for write operations to make the planner more maintainable and to make Spark's write behavior predictable and reliable. It proposes the following principles:

      1. Use well-defined logical plan nodes for all high-level operations: insert, create, CTAS, overwrite table, etc.
      2. Use planner rules that match on these high-level nodes, so that it isn’t necessary to create rules to match each eventual code path individually.
      3. Clearly define Spark’s behavior for these logical plan nodes. Physical nodes should implement that behavior so that all code paths eventually make the same guarantees.
      4. Specialize implementation when creating a physical plan, not logical plans. This will avoid behavior drift and ensure planner code is shared across physical implementations.

      The SPIP doc presents a small but complete set of those high-level logical operations, most of which are already defined in SQL or implemented by some write path in Spark.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            rdblue Ryan Blue
            Votes:
            4 Vote for this issue
            Watchers:
            19 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment