Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38852

Better Data Source V2 operator pushdown framework

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.0
    • None
    • SQL
    • None

    Description

      Currently, Spark supports push down Filters and Aggregates to data source.
      However, the Data Source V2 operator pushdown framework has the following shortcomings:

      1. Only simple filter and aggregate are supported, which makes it impossible to apply in most scenarios
      2. The incompatibility of SQL syntax makes it impossible to apply in most scenarios
      3. Aggregate push down does not support multiple partitions of data sources
      4. Spark's additional aggregate will cause some overhead
      5. Limit push down is not supported
      6. Top n push down is not supported
      7. Aggregate push down does not support group by expressions
      8. Aggregate push down does not support not use aggregate functions
      9. Offset push down is not supported
      10. Paging push down is not supported
      11. UDF/UDAF push down is not supported

      Attachments

        Issue Links

        1.
        pushDownPredicate=false should prevent push down filters to JDBC data source Sub-task Resolved Jiaan Geng Actions
        2.
        Improve the implement of aggregate pushdown. Sub-task Resolved Jiaan Geng Actions
        3.
        Move compileAggregates from JDBCRDD to JdbcDialect Sub-task Resolved Jiaan Geng Actions
        4.
        Support push down top N to JDBC data source V2 Sub-task Resolved Jiaan Geng Actions
        5.
        Support datasource v2 complete aggregate pushdown Sub-task Resolved Jiaan Geng Actions
        6.
        Improve the implement of JDBCV2Suite Sub-task Resolved Jiaan Geng Actions
        7.
        Translate more standard aggregate functions for pushdown Sub-task Resolved Jiaan Geng Actions
        8.
        Upgrade h2 from 1.4.195 to 2.0.202 Sub-task Resolved Jiaan Geng Actions
        9.
        DS V2 supports partial aggregate push-down AVG Sub-task Resolved Jiaan Geng Actions
        10.
        Compile aggregate functions of build-in JDBC dialect Sub-task Resolved Jiaan Geng Actions
        11.
        A new framework to represent catalyst expressions in DS v2 APIs Sub-task Resolved Jiaan Geng Actions
        12.
        Reactor framework so as JDBC dialect could compile expression by self way Sub-task Resolved Jiaan Geng Actions
        13.
        Add factory method getConnection into JDBCDialect. Sub-task Resolved Jiaan Geng Actions
        14.
        If `Sum`, `Count`, `Any` accompany distinct, cannot do partial agg push down. Sub-task Resolved Jiaan Geng Actions
        15.
        Refactor framework so as JDBC dialect could compile filter by self way Sub-task Resolved Jiaan Geng Actions
        16.
        DS V2 aggregate push-down supports project with alias Sub-task Resolved Jiaan Geng Actions
        17.
        DS V2 topN push-down supports project with alias Sub-task Resolved Jiaan Geng Actions
        18.
        Datasource v2 supports partial topN push-down Sub-task Resolved Jiaan Geng Actions
        19.
        Support push down Cast to JDBC data source V2 Sub-task Resolved Jiaan Geng Actions
        20.
        If limit could pushed down and Data source only have one partition, DS V2 should not do limit again Sub-task Resolved Apache Spark Actions
        21.
        DS V2 supports push down misc non-aggregate functions Sub-task Resolved Jiaan Geng Actions
        22.
        DS V2 supports push down math functions Sub-task Resolved Jiaan Geng Actions
        23.
        Update document of JDBC options for pushDownAggregate and pushDownLimit Sub-task Resolved Jiaan Geng Actions
        24.
        DS V2 supports push down string functions Sub-task Resolved Zhixiong Chen Actions
        25.
        DS V2 supports push down collection functions Sub-task Open Unassigned Actions
        26.
        DS V2 supports push down misc functions Sub-task Resolved Zhixiong Chen Actions
        27.
        DS V2 supports push down OFFSET operator Sub-task Resolved Jiaan Geng Actions
        28.
        DS V2 aggregate push-down supports group by expressions Sub-task Resolved Jiaan Geng Actions
        29.
        DS V2 supports push down datetime functions Sub-task Resolved Jiaan Geng Actions
        30.
        DS V2 Top N push-down supports order by expressions Sub-task Resolved Jiaan Geng Actions
        31.
        DS V2 Limit push-down should avoid out of memory Sub-task Resolved Unassigned Actions
        32.
        DS V2 aggregate partial push-down should supports group by without aggregate functions Sub-task Resolved Jiaan Geng Actions
        33.
        DS V2 supports push down DS V2 UDF Sub-task Resolved Jiaan Geng Actions
        34.
        DS V2 aggregate push down can work with OFFSET or LIMIT Sub-task Resolved Jiaan Geng Actions
        35.
        H2Dialect should override getJDBCType so as make the data type is correct Sub-task Resolved Jiaan Geng Actions
        36.
        Jdbc dialect should decide which function could be pushed down. Sub-task Resolved Jiaan Geng Actions
        37.
        JDBC dialect supports registering dialect specific functions Sub-task Resolved Jiaan Geng Actions
        38.
        Compile build-in linear regression aggregate functions for JDBC dialect Sub-task Resolved Jiaan Geng Actions
        39.
        Translate linear regression aggregate functions for pushdown Sub-task Resolved Jiaan Geng Actions
        40.
        Capitalize sql keywords in JDBCV2Suite Sub-task Resolved Jiaan Geng Actions
        41.
        DS V2 supports push down misc non-aggregate functions(non ANSI) Sub-task Resolved Jiaan Geng Actions
        42.
        DS V2 supports push down math functions(non ANSI) Sub-task Resolved Jiaan Geng Actions
        43.
        DS V2 pushdown should unify the compile API Sub-task Resolved Jiaan Geng Actions
        44.
        Update document of JDBC options for pushDownOffset Sub-task Resolved Jiaan Geng Actions
        45.
        DS V2 aggregate push down can work with Top N or Paging (Sort with group expressions) Sub-task Resolved Jiaan Geng Actions
        46.
        Simplify V2ExpressionBuilder by extract common method. Sub-task Resolved Jiaan Geng Actions
        47.
        Translate common used aggregate functions for pushdown (non ANSI) Sub-task Open Unassigned Actions
        48.
        Organize the check of push down information for JDBCV2Suite Sub-task Resolved miracle Actions
        49.
        DS V2 supports push down string functions(non ANSI) Sub-task Resolved Unassigned Actions
        50.
        DS V2 push-down translate Cast if the cast is safe Sub-task Resolved Jiaan Geng Actions
        51.
        DS V2 pushdown should unify the translate path Sub-task Resolved Jiaan Geng Actions
        52.
        DS V2 expressions should have the default toString Sub-task Resolved Apache Spark Actions
        53.
        Extract the function that construct the select statement for JDBC dialect. Sub-task Resolved Jiaan Geng Actions
        54.
        DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by themselves Sub-task Resolved Jiaan Geng Actions
        55.
        DS V2 pushdown could let JDBC dialect decide to push down offset and limit Sub-task Resolved Jiaan Geng Actions
        56.
        The built-in dialects support OFFSET and paging query. Sub-task Resolved Unassigned Actions
        57.
        Fix the bug that pushdown offset or paging is invalid for some built-in dialect Sub-task Resolved Unassigned Actions
        58.
        Change the default value of JDBC options about push down to true Sub-task Resolved Jiaan Geng Actions
        59.
        Abstract the excluded method for better test for JDBC docker tests. Sub-task Resolved Jiaan Geng Actions
        60.
        Improve the hashCode for Some DS V2 Expression Sub-task Resolved Jiaan Geng Actions
        61.
        DS V2 supports push down Mode Sub-task Resolved Jiaan Geng Actions
        62.
        Escape the single quote, _ and % for DS V2 pushdown Sub-task Resolved Jiaan Geng Actions
        63.
        DS V2 supports push down PERCENTILE_CONT and PERCENTILE_DISC Sub-task Resolved Jiaan Geng Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            beliefer Jiaan Geng

            Dates

              Created:
              Updated:

              Slack

                Issue deployment