Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11689

[Rust][DataFusion] Reduce copies in DataFusion LogicalPlan and Expr creation

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Rust, Rust - DataFusion
    • None

    Description

      The theme of this overall epic to make the plan and expression rewriting phases of DataFusion more efficient by avoiding copies by leveraging the Rust type system

      Benefits:

      • More standard / idomatic Rust usage
      • faster / more efficient (I don't have numbers to back this up)

      Downsides:

      • These will be backwards incompatible changes

      Background

      Many things in DataFusion look like

      Input -tranformation->output

      And the input is not used again. In rust, you can model this by giving ownership to the transformation

      At a high level the idea is to avoid so much cloning in DataFustion

      The basic principle is if the function needs to `clone` one of its arguments, the caller should be given the choice of when to do that. Often, the caller can give up ownership without issue

      I envision at least the following the following items:
      1. Optimizer passes that take `&LogicalPlan` and produce a new `LogicalPlan` even though most callsites do not need the original
      2. Expr builder calls that take `&expr` and return a new `Expr`
      3. An expression rewriter (TODO) while running down optimizer passes

      I think this style takes advantage of Rust's ownership model and will let us avoid a lot o copying and allocations and avoid the need for something like slab allocators

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              alamb Andrew Lamb
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10h
                  10h