Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35042

Support traversal pruning in transform/resolve functions and their call sites

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.0
    • None
    • Optimizer
    • None

    Description

      Transform/resolve functions are called ~280k times per query on average for a TPC-DS query, which are way more than necessary. We can reduce those calls with early exit information and conditions. ThisĀ doc some evaluation numbers with a prototype.

      Attachments

        1.
        Support traversal pruning in the transform function family Sub-task Resolved Yingyi Bu Actions
        2.
        Support traversal pruning in resolve functions in AnalysisHelper Sub-task Resolved Yingyi Bu Actions
        3.
        Use static treePatternBitSet for Leaf expressions like AttributeReference and Literal Sub-task Resolved Yingyi Bu Actions
        4.
        Fix BitSet.union Sub-task Resolved Unassigned Actions
        5.
        Migrate to transformWithPruning or resolveWithPruning for subquery related rules Sub-task Resolved Yingyi Bu Actions
        6.
        Migrate to transformWithPruning for leftover optimizer rules Sub-task Resolved Yingyi Bu Actions
        7.
        Migrate to transformWithPruning or resolveWithPruning for expression rules Sub-task Resolved Yingyi Bu Actions
        8.
        Migrate to transformWithPruning or resolveWithPruning for object rules Sub-task Resolved Yingyi Bu Actions
        9.
        Migrate to transformWithPruning or resolveWithPruning for rules in finishAnalysis Sub-task Resolved Yingyi Bu Actions
        10.
        Migrate to resolveWithPruning for two command rules Sub-task Resolved Unassigned Actions
        11.
        Support traversal pruning in transformUpWithNewOutput Sub-task Open Unassigned Actions
        12.
        Add rule id to all Analyzer rules in fixed point batches Sub-task Resolved Yingyi Bu Actions
        13.
        Migrate to transformWithPruning for top-level rules under catalyst/optimizer Sub-task Resolved Yingyi Bu Actions
        14.
        Migrate to transformWithPruning for rules in optimizer/Optimizer.scala Sub-task Resolved Yingyi Bu Actions
        15.
        Migrate transformAllExpressions callsites to transformAllExpressionsWithPruning Sub-task Resolved Apache Spark Actions
        16.
        Add tree pattern pruning into Analyzer rules Sub-task Resolved Yingyi Bu Actions
        17.
        Add rule id pruning to the TypeCoercion rule Sub-task Resolved Yingyi Bu Actions
        18.
        Support traversal pruning in extendedResolutionRules and postHocResolutionRules Sub-task In Progress Unassigned Actions
        19.
        Add tree pattern pruning to CTESubstitution rule Sub-task Resolved Josh Rosen Actions
        20.
        Identify aggregation expression in the nodePatterns of PythonUDF Sub-task Resolved Gengliang Wang Actions
        21.
        Add a linter rule to enforce transforming with pruning Sub-task Open Unassigned Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            buyingyi Yingyi Bu
            Gengliang Wang Gengliang Wang

            Dates

              Created:
              Updated:

              Slack

                Issue deployment