Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11331

[Rust][DataFusion] Improve performance of Array.slice

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Rust, Rust - DataFusion
    • None

    Description

      In DataFusion we are using Array.slice since https://github.com/apache/arrow/pull/9271 to pass data into the accumulators, instead of having the overhead of building arrays (possibly with few rows) at once.

      However, it seems pretty inefficient by now (taking a 1/6 of instructions for hash aggregates) doing some allocations under the hood instead of the promised "zero copy", much more than for example take which copies / shuffles the entire array based on indices.

      jorgecarleitao

      Yes, slicing is suboptimal atm. Also, IMO it should not be the Array to implement that method, but each implementation individually. I haven't touch that part yet, though.

       

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            Dandandan Daniël Heres
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: