Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22386

Data Source V2 improvements

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
    • Target Version/s:

      Attachments

      Issue Links

      1.
      Limit push down Sub-task Open Unassigned Actions
      2.
      Aggregate push down Sub-task In Progress Unassigned Actions
      3.
      add `MetadataCreationSupport` trait to separate data and metadata handling at write path Sub-task Resolved Unassigned Actions
      4.
      DataSourceV2 should use immutable trees. Sub-task Resolved Ryan Blue Actions
      5.
      DataSourceV2 should support named tables in DataFrameReader, DataFrameWriter Sub-task Resolved Unassigned Actions
      6.
      Reorganize packages in data source V2 Sub-task Resolved Gengliang Wang Actions
      7.
      DataSourceV2 should apply some validation when writing. Sub-task Resolved Unassigned Actions
      8.
      DataSourceV2 should use the output commit coordinator. Sub-task Resolved Ryan Blue Actions
      9.
      DataSourceV2 readers should always produce InternalRow. Sub-task Resolved Ryan Blue Actions
      10.
      DataSourceOptions should handle path and table names to avoid confusion. Sub-task Resolved Wenchen Fan Actions
      11.
      use InternalRow in DataSourceWriter Sub-task Resolved Wenchen Fan Actions
      12.
      DataSourceV2 should provide a way to get a source's schema. Sub-task Resolved Unassigned Actions
      13.
      DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema Sub-task Resolved Ryan Blue Actions
      14.
      DataSourceV2: Rename DataReaderFactory to InputPartition. Sub-task Resolved Ryan Blue Actions
      15.
      Data Source V2: Join Push Down Sub-task Open Unassigned Actions
      16.
      DataSourceV2 should push filters and projection at physical plan conversion Sub-task Resolved Ryan Blue Actions
      17.
      remove SupportsDeprecatedScanRow Sub-task Resolved Wenchen Fan Actions
      18.
      Add support for USING syntax for DataSourceV2 Sub-task In Progress Unassigned Actions
      19.
      merge ReadSupport and ReadSupportWithSchema Sub-task Resolved Wenchen Fan Actions
      20.
      DataSourceV2: Remove SupportsPushDownCatalystFilters Sub-task Resolved Reynold Xin Actions
      21.
      DataSourceV2: Add interfaces to pass required sorting and clustering for writes Sub-task In Progress Unassigned Actions
      22.
      DataSourceV2: Structured Streaming does not respect SessionConfigSupport Sub-task Resolved Hyukjin Kwon Actions
      23.
      Avoid to create a readsupport at write path in Data Source V2 Sub-task Resolved Hyukjin Kwon Actions
      24.
      Recover options and properties and pass them back into the v1 API Sub-task Open Unassigned Actions
      25.
      DataSourceV2: Add new DataFrameWriter API for v2 Sub-task Resolved Ryan Blue Actions
      26.
      Pass in number of partitions to BuildWriter Sub-task Resolved Ximo Guanter Actions
      27.
      DataSource V2: API to request distribution and ordering on write Sub-task Resolved Anton Okolnychyi Actions
      28.
      Data Source V2: Remove read specific distributions Sub-task Open Unassigned Actions
      29.
      DataSource V2: Build logical writes in the optimizer Sub-task Resolved Anton Okolnychyi Actions
      30.
      DataSource V2: Inject repartition and sort nodes to satisfy required distribution and ordering Sub-task Resolved Anton Okolnychyi Actions
      31.
      DataSource V2: Use Write abstraction in StreamExecution Sub-task Resolved Anton Okolnychyi Actions
      32.
      DataSource V2: Support required distribution and ordering in SS Sub-task In Progress Unassigned Actions
      33.
      Let AQE determine the right parallelism in DistributionAndOrderingUtils Sub-task Open Unassigned Actions
      34.
      Aggregate (Min/Max/Count) push down for Parquet Sub-task In Progress Apache Spark Actions
      35.
      Aggregate (Min/Max/Count) push down for ORC Sub-task Open Unassigned Actions

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            cloud_fan Wenchen Fan

            Dates

            • Created:
              Updated:

              Issue deployment