Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22386

Data Source V2 improvements

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 2.3.0
    • None
    • SQL

    Attachments

      Issue Links

        1.
        Limit push down Sub-task Resolved Unassigned
        2.
        Aggregate push down Sub-task Resolved Unassigned
        3.
        add `MetadataCreationSupport` trait to separate data and metadata handling at write path Sub-task Resolved Unassigned
        4.
        DataSourceV2 should use immutable trees. Sub-task Resolved Ryan Blue
        5.
        DataSourceV2 should support named tables in DataFrameReader, DataFrameWriter Sub-task Resolved Unassigned
        6.
        Reorganize packages in data source V2 Sub-task Resolved Gengliang Wang
        7.
        DataSourceV2 should apply some validation when writing. Sub-task Resolved Unassigned
        8.
        DataSourceV2 should use the output commit coordinator. Sub-task Resolved Ryan Blue
        9.
        DataSourceV2 readers should always produce InternalRow. Sub-task Resolved Ryan Blue
        10.
        DataSourceOptions should handle path and table names to avoid confusion. Sub-task Resolved Wenchen Fan
        11.
        use InternalRow in DataSourceWriter Sub-task Resolved Wenchen Fan
        12.
        DataSourceV2 should provide a way to get a source's schema. Sub-task Resolved Unassigned
        13.
        DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema Sub-task Resolved Ryan Blue
        14.
        DataSourceV2: Rename DataReaderFactory to InputPartition. Sub-task Resolved Ryan Blue
        15.
        Data Source V2: Join Push Down Sub-task Resolved Unassigned
        16.
        DataSourceV2 should push filters and projection at physical plan conversion Sub-task Resolved Ryan Blue
        17.
        remove SupportsDeprecatedScanRow Sub-task Resolved Wenchen Fan
        18.
        Add support for USING syntax for DataSourceV2 Sub-task Resolved Unassigned
        19.
        merge ReadSupport and ReadSupportWithSchema Sub-task Resolved Wenchen Fan
        20.
        DataSourceV2: Remove SupportsPushDownCatalystFilters Sub-task Resolved Reynold Xin
        21.
        DataSourceV2: Add interfaces to pass required sorting and clustering for writes Sub-task Resolved Unassigned
        22.
        DataSourceV2: Structured Streaming does not respect SessionConfigSupport Sub-task Resolved Hyukjin Kwon
        23.
        Avoid to create a readsupport at write path in Data Source V2 Sub-task Resolved Hyukjin Kwon
        24.
        Recover options and properties and pass them back into the v1 API Sub-task Open Unassigned
        25.
        DataSourceV2: Add new DataFrameWriter API for v2 Sub-task Resolved Ryan Blue
        26.
        Pass in number of partitions to BuildWriter Sub-task Resolved Ximo Guanter
        27.
        DataSource V2: API to request distribution and ordering on write Sub-task Resolved Anton Okolnychyi
        28.
        Data Source V2: Remove read specific distributions Sub-task Open Unassigned
        29.
        DataSource V2: Build logical writes in the optimizer Sub-task Resolved Anton Okolnychyi
        30.
        DataSource V2: Inject repartition and sort nodes to satisfy required distribution and ordering Sub-task Resolved Anton Okolnychyi
        31.
        DataSource V2: Use Write abstraction in StreamExecution Sub-task Resolved Anton Okolnychyi
        32.
        DataSource V2: Support required distribution and ordering in SS Sub-task Resolved Anton Okolnychyi
        33.
        Let AQE determine the right parallelism in DistributionAndOrderingUtils Sub-task Open Unassigned
        34.
        DS V2 Aggregate push down Sub-task Resolved Huaxin Gao
        35.
        Aggregate (Min/Max/Count) push down for ORC Sub-task Resolved Cheng Su
        36.
        Aggregate (Min/Max/Count) push down for Parquet Sub-task Resolved Huaxin Gao
        37.
        Push down group by partition column for Aggregate (Min/Max/Count) for Parquet Sub-task Resolved Huaxin Gao
        38.
        Push down filter by partition column for Aggregate (Min/Max/Count) for Parquet Sub-task Resolved Huaxin Gao
        39.
        Add benchmark for aggregate push down Sub-task Open Unassigned
        40.
        Do not split input file for Parquet reader with aggregate push down Sub-task Resolved Cheng Su
        41.
        Not log empty aggregate and group by in JDBCScan Sub-task Resolved Huaxin Gao
        42.
        DataSourceV2: Distribution and ordering support V2 function in writing Sub-task Resolved Cheng Pan

        Activity

          People

            Unassigned Unassigned
            cloud_fan Wenchen Fan
            Votes:
            9 Vote for this issue
            Watchers:
            47 Start watching this issue

            Dates

              Created:
              Updated: