Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5227

[Rust] [DataFusion] Re-implement query execution with an extensible physical query plan

    XMLWordPrintableJSON

    Details

      Description

       This story (maybe it should have been an epic with hindsight) is to re-implement query execution in DataFusion using a physical plan that supports partitions and parallel execution.

      This will replace the current query execution which happens directly from the logical plan.

      The new physical plan is based on traits and is therefore extensible by other projects that use Arrow. For example, another project could add physical plans for distributed compute.

      See design doc at https://docs.google.com/document/d/1ATZGIs8ry_kJeoTgmJjLrg6Ssb5VE7lNzWuz_4p6EWk/edit?usp=sharing for more info

        Attachments

          Issue Links

          1.
          [Rust] [DataFusion] Create traits for phsyical query plan Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          2.
          [Rust] [DataFusion] Implement parallel execution for parquet scan Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 10m
          3.
          [Rust] [DataFusion] Implement parallel execution for CSV scan Sub-task Resolved Andy Grove  
          4.
          [Rust] [DataFusion] Implement parallel execution for projection Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2.5h
          5.
          [Rust] [DataFusion] Implement parallel execution for selection Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 50m
          6.
          [Rust] [DataFusion] Implement parallel execution for hash aggregate Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1.5h
          7.
          [Rust] [DataFusion] Implement parallel execution for limit Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 20m
          8.
          [Rust] [DataFusion] Create physical plan from logical plan Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2.5h
          9.
          [Rust] [DataFusion] Refactor TableProvider to return thread-safe BatchIterator Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          10.
          [Rust] [DataFusion] Create "merge" execution plan Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          11.
          [Rust] [DataFusion] Implement SUM aggregate expression Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 10m
          12.
          [Rust] [DataFusion] Implement MIN and MAX aggregate expressions Sub-task Resolved Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          13.
          [Rust] [DataFusion] Implement COUNT aggregate expression Sub-task Resolved Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 40m
          14.
          [Rust] [DataFusion] Implement AVG aggregate expression Sub-task Resolved Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          15.
          [Rust] [DataFusion] Implement numeric literal expressions Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          16.
          [Rust] [DataFusion] Implement CAST expression Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          17.
          [Rust] [DataFusion] Implement physical expression for binary expressions Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          18.
          [Rust] [DataFusion] Update examples to use physical query plan Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          19.
          [Rust] [DataFusion] Update unit tests to use physical query plan Sub-task Closed Andy Grove  
          20.
          [Rust] [DataFusion] Update integration tests to use physical plan Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          21.
          [Rust] [DataFusion] Remove execution of logical plan Sub-task Resolved Andy Grove

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h

            Activity

              People

              • Assignee:
                andygrove Andy Grove
                Reporter:
                andygrove Andy Grove
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 24h 50m
                  24h 50m