Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22386

Data Source V2 improvements

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 2.3.0
    • None
    • SQL

    Attachments

      Issue Links

      1.
      Limit push down Sub-task Resolved Unassigned Actions
      2.
      Aggregate push down Sub-task Resolved Unassigned Actions
      3.
      add `MetadataCreationSupport` trait to separate data and metadata handling at write path Sub-task Resolved Unassigned Actions
      4.
      DataSourceV2 should use immutable trees. Sub-task Resolved Ryan Blue Actions
      5.
      DataSourceV2 should support named tables in DataFrameReader, DataFrameWriter Sub-task Resolved Unassigned Actions
      6.
      Reorganize packages in data source V2 Sub-task Resolved Gengliang Wang Actions
      7.
      DataSourceV2 should apply some validation when writing. Sub-task Resolved Unassigned Actions
      8.
      DataSourceV2 should use the output commit coordinator. Sub-task Resolved Ryan Blue Actions
      9.
      DataSourceV2 readers should always produce InternalRow. Sub-task Resolved Ryan Blue Actions
      10.
      DataSourceOptions should handle path and table names to avoid confusion. Sub-task Resolved Wenchen Fan Actions
      11.
      use InternalRow in DataSourceWriter Sub-task Resolved Wenchen Fan Actions
      12.
      DataSourceV2 should provide a way to get a source's schema. Sub-task Resolved Unassigned Actions
      13.
      DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema Sub-task Resolved Ryan Blue Actions
      14.
      DataSourceV2: Rename DataReaderFactory to InputPartition. Sub-task Resolved Ryan Blue Actions
      15.
      Data Source V2: Join Push Down Sub-task Resolved Unassigned Actions
      16.
      DataSourceV2 should push filters and projection at physical plan conversion Sub-task Resolved Ryan Blue Actions
      17.
      remove SupportsDeprecatedScanRow Sub-task Resolved Wenchen Fan Actions
      18.
      Add support for USING syntax for DataSourceV2 Sub-task Resolved Unassigned Actions
      19.
      merge ReadSupport and ReadSupportWithSchema Sub-task Resolved Wenchen Fan Actions
      20.
      DataSourceV2: Remove SupportsPushDownCatalystFilters Sub-task Resolved Reynold Xin Actions
      21.
      DataSourceV2: Add interfaces to pass required sorting and clustering for writes Sub-task Resolved Unassigned Actions
      22.
      DataSourceV2: Structured Streaming does not respect SessionConfigSupport Sub-task Resolved Hyukjin Kwon Actions
      23.
      Avoid to create a readsupport at write path in Data Source V2 Sub-task Resolved Hyukjin Kwon Actions
      24.
      Recover options and properties and pass them back into the v1 API Sub-task Open Unassigned Actions
      25.
      DataSourceV2: Add new DataFrameWriter API for v2 Sub-task Resolved Ryan Blue Actions
      26.
      Pass in number of partitions to BuildWriter Sub-task Resolved Ximo Guanter Actions
      27.
      DataSource V2: API to request distribution and ordering on write Sub-task Resolved Anton Okolnychyi Actions
      28.
      Data Source V2: Remove read specific distributions Sub-task Open Unassigned Actions
      29.
      DataSource V2: Build logical writes in the optimizer Sub-task Resolved Anton Okolnychyi Actions
      30.
      DataSource V2: Inject repartition and sort nodes to satisfy required distribution and ordering Sub-task Resolved Anton Okolnychyi Actions
      31.
      DataSource V2: Use Write abstraction in StreamExecution Sub-task Resolved Anton Okolnychyi Actions
      32.
      DataSource V2: Support required distribution and ordering in SS Sub-task Resolved Anton Okolnychyi Actions
      33.
      Let AQE determine the right parallelism in DistributionAndOrderingUtils Sub-task Open Unassigned Actions
      34.
      DS V2 Aggregate push down Sub-task Resolved Huaxin Gao Actions
      35.
      Aggregate (Min/Max/Count) push down for ORC Sub-task Resolved Cheng Su Actions
      36.
      Aggregate (Min/Max/Count) push down for Parquet Sub-task Resolved Huaxin Gao Actions
      37.
      Push down group by partition column for Aggregate (Min/Max/Count) for Parquet Sub-task Resolved Huaxin Gao Actions
      38.
      Push down filter by partition column for Aggregate (Min/Max/Count) for Parquet Sub-task Resolved Huaxin Gao Actions
      39.
      Add benchmark for aggregate push down Sub-task Open Unassigned Actions
      40.
      Do not split input file for Parquet reader with aggregate push down Sub-task Resolved Cheng Su Actions
      41.
      Not log empty aggregate and group by in JDBCScan Sub-task Resolved Huaxin Gao Actions
      42.
      DataSourceV2: Distribution and ordering support V2 function in writing Sub-task Resolved Cheng Pan Actions

      Activity

        This comment will be Viewable by All Users Viewable by All Users
        Cancel

        People

          Unassigned Unassigned
          cloud_fan Wenchen Fan

          Dates

            Created:
            Updated:

            Slack

              Issue deployment