Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38852

Better Data Source V2 operator pushdown framework

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.0
    • None
    • SQL
    • None

    Description

      Currently, Spark supports push down Filters and Aggregates to data source.
      However, the Data Source V2 operator pushdown framework has the following shortcomings:

      1. Only simple filter and aggregate are supported, which makes it impossible to apply in most scenarios
      2. The incompatibility of SQL syntax makes it impossible to apply in most scenarios
      3. Aggregate push down does not support multiple partitions of data sources
      4. Spark's additional aggregate will cause some overhead
      5. Limit push down is not supported
      6. Top n push down is not supported
      7. Aggregate push down does not support group by expressions
      8. Aggregate push down does not support not use aggregate functions
      9. Offset push down is not supported
      10. Paging push down is not supported

      Attachments

        1.
        pushDownPredicate=false should prevent push down filters to JDBC data source Sub-task Resolved jiaan.geng
        2.
        Improve the implement of aggregate pushdown. Sub-task Resolved jiaan.geng
        3.
        Move compileAggregates from JDBCRDD to JdbcDialect Sub-task Resolved jiaan.geng
        4.
        Support push down top N to JDBC data source V2 Sub-task Resolved jiaan.geng
        5.
        Support datasource v2 complete aggregate pushdown Sub-task Resolved jiaan.geng
        6.
        Improve the implement of JDBCV2Suite Sub-task Resolved jiaan.geng
        7.
        Translate more standard aggregate functions for pushdown Sub-task Resolved jiaan.geng
        8.
        Upgrade h2 from 1.4.195 to 2.0.202 Sub-task Resolved jiaan.geng
        9.
        DS V2 supports partial aggregate push-down AVG Sub-task Resolved jiaan.geng
        10.
        Compile aggregate functions of build-in JDBC dialect Sub-task Resolved jiaan.geng
        11.
        A new framework to represent catalyst expressions in DS v2 APIs Sub-task Resolved jiaan.geng
        12.
        Reactor framework so as JDBC dialect could compile expression by self way Sub-task Resolved jiaan.geng
        13.
        Add factory method getConnection into JDBCDialect. Sub-task Resolved jiaan.geng
        14.
        If `Sum`, `Count`, `Any` accompany distinct, cannot do partial agg push down. Sub-task Resolved jiaan.geng
        15.
        Refactor framework so as JDBC dialect could compile filter by self way Sub-task Resolved jiaan.geng
        16.
        DS V2 aggregate push-down supports project with alias Sub-task Resolved jiaan.geng
        17.
        DS V2 topN push-down supports project with alias Sub-task Resolved jiaan.geng
        18.
        Datasource v2 supports partial topN push-down Sub-task Resolved jiaan.geng
        19.
        Support push down Cast to JDBC data source V2 Sub-task Resolved jiaan.geng
        20.
        If limit could pushed down and Data source only have one partition, DS V2 should not do limit again Sub-task Resolved Apache Spark
        21.
        DS V2 supports push down misc non-aggregate functions Sub-task Resolved jiaan.geng
        22.
        DS V2 supports push down math functions Sub-task Resolved jiaan.geng
        23.
        Update document of JDBC options for pushDownAggregate and pushDownLimit Sub-task Resolved jiaan.geng
        24.
        DS V2 supports push down string functions Sub-task Resolved Zhixiong Chen
        25.
        DS V2 supports push down collection functions Sub-task Open Unassigned
        26.
        DS V2 supports push down misc functions Sub-task Resolved Zhixiong Chen
        27.
        DS V2 supports push down OFFSET operator Sub-task Resolved jiaan.geng
        28.
        DS V2 aggregate push-down supports group by expressions Sub-task Resolved jiaan.geng
        29.
        DS V2 supports push down datetime functions Sub-task Resolved jiaan.geng
        30.
        DS V2 Top N push-down supports order by expressions Sub-task Resolved jiaan.geng
        31.
        DS V2 Limit push-down should avoid out of memory Sub-task Resolved Unassigned
        32.
        DS V2 aggregate partial push-down should supports group by without aggregate functions Sub-task Resolved jiaan.geng
        33.
        DS V2 supports push down DS V2 UDF Sub-task Resolved jiaan.geng
        34.
        DS V2 aggregate push down can work with OFFSET or LIMIT Sub-task Resolved jiaan.geng
        35.
        H2Dialect should override getJDBCType so as make the data type is correct Sub-task Resolved jiaan.geng
        36.
        Jdbc dialect should decide which function could be pushed down. Sub-task Resolved jiaan.geng
        37.
        JDBC dialect supports registering dialect specific functions Sub-task Resolved jiaan.geng
        38.
        Compile build-in linear regression aggregate functions for JDBC dialect Sub-task Resolved jiaan.geng
        39.
        Translate linear regression aggregate functions for pushdown Sub-task Resolved jiaan.geng
        40.
        Capitalize sql keywords in JDBCV2Suite Sub-task Resolved jiaan.geng
        41.
        DS V2 supports push down misc non-aggregate functions(non ANSI) Sub-task Resolved jiaan.geng
        42.
        DS V2 supports push down math functions(non ANSI) Sub-task Resolved jiaan.geng
        43.
        DS V2 pushdown should unify the compile API Sub-task Resolved jiaan.geng
        44.
        Update document of JDBC options for pushDownOffset Sub-task Resolved jiaan.geng
        45.
        DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions) Sub-task Resolved jiaan.geng
        46.
        Simplify V2ExpressionBuilder by extract common method. Sub-task Resolved jiaan.geng
        47.
        Translate common used aggregate functions for pushdown (non ANSI) Sub-task Open Unassigned
        48.
        Organize the check of push down information for JDBCV2Suite Sub-task Resolved miracle
        49.
        DS V2 supports push down string functions(non ANSI) Sub-task Resolved Unassigned
        50.
        DS V2 push-down translate Cast if the cast is safe Sub-task Resolved jiaan.geng
        51.
        DS V2 pushdown should unify the translate path Sub-task Resolved jiaan.geng
        52.
        DS V2 expressions should have the default toString Sub-task Resolved Apache Spark
        53.
        Extract the function that construct the select statement for JDBC dialect. Sub-task Resolved jiaan.geng
        54.
        DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by themselves Sub-task Resolved jiaan.geng
        55.
        DS V2 pushdown could let JDBC dialect decide to push down offset and limit Sub-task Resolved jiaan.geng
        56.
        The built-in dialects support OFFSET and paging query. Sub-task Resolved Unassigned
        57.
        Fix the bug that pushdown offset or paging is invalid for some built-in dialect Sub-task In Progress Unassigned
        58.
        Change the default value of JDBC options about push down to true Sub-task Resolved jiaan.geng
        59.
        Abstract the excluded method for better test for JDBC docker tests. Sub-task Resolved jiaan.geng

        Activity

          People

            Unassigned Unassigned
            beliefer jiaan.geng
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: