Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5227

[Rust] [DataFusion] Re-implement query execution with an extensible physical query plan

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

       This story (maybe it should have been an epic with hindsight) is to re-implement query execution in DataFusion using a physical plan that supports partitions and parallel execution.

      This will replace the current query execution which happens directly from the logical plan.

      The new physical plan is based on traits and is therefore extensible by other projects that use Arrow. For example, another project could add physical plans for distributed compute.

      See design doc at https://docs.google.com/document/d/1ATZGIs8ry_kJeoTgmJjLrg6Ssb5VE7lNzWuz_4p6EWk/edit?usp=sharing for more info

        Attachments

        Issue Links

        1.
        [Rust] [DataFusion] Create traits for phsyical query plan Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 0.5h
        Actions
        2.
        [Rust] [DataFusion] Implement parallel execution for parquet scan Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h 10m
        Actions
        3.
        [Rust] [DataFusion] Implement parallel execution for CSV scan Sub-task Resolved Andy Grove   Actions
        4.
        [Rust] [DataFusion] Implement parallel execution for projection Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 2.5h
        Actions
        5.
        [Rust] [DataFusion] Implement parallel execution for selection Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h 50m
        Actions
        6.
        [Rust] [DataFusion] Implement parallel execution for hash aggregate Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1.5h
        Actions
        7.
        [Rust] [DataFusion] Implement parallel execution for limit Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h 20m
        Actions
        8.
        [Rust] [DataFusion] Create physical plan from logical plan Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 2.5h
        Actions
        9.
        [Rust] [DataFusion] Refactor TableProvider to return thread-safe BatchIterator Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 40m
        Actions
        10.
        [Rust] [DataFusion] Create "merge" execution plan Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 40m
        Actions
        11.
        [Rust] [DataFusion] Implement SUM aggregate expression Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 2h 10m
        Actions
        12.
        [Rust] [DataFusion] Implement MIN and MAX aggregate expressions Sub-task Resolved Unassigned

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 50m
        Actions
        13.
        [Rust] [DataFusion] Implement COUNT aggregate expression Sub-task Resolved Unassigned

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 2h 40m
        Actions
        14.
        [Rust] [DataFusion] Implement AVG aggregate expression Sub-task Resolved Unassigned

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h 40m
        Actions
        15.
        [Rust] [DataFusion] Implement numeric literal expressions Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        16.
        [Rust] [DataFusion] Implement CAST expression Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 40m
        Actions
        17.
        [Rust] [DataFusion] Implement physical expression for binary expressions Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 50m
        Actions
        18.
        [Rust] [DataFusion] Update examples to use physical query plan Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 40m
        Actions
        19.
        [Rust] [DataFusion] Update unit tests to use physical query plan Sub-task Closed Andy Grove   Actions
        20.
        [Rust] [DataFusion] Update integration tests to use physical plan Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 50m
        Actions
        21.
        [Rust] [DataFusion] Remove execution of logical plan Sub-task Resolved Andy Grove

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 0.5h
        Actions

          Activity

            People

            • Assignee:
              andygrove Andy Grove
              Reporter:
              andygrove Andy Grove

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 24h 50m
                24h 50m

                  Issue deployment