Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11606

[Rust] [DataFusion] Need guidance on HashAggregateExec reconstruction

    XMLWordPrintableJSON

Details

    Description

      We have run into an issue in the Ballista project where we are reconstructing the Final and Partial HashAggregateExec operators [1] for distributed execution and we need some guidance.

      The Partial HashAggregateExec gets created OK and executes correctly.

      However, when we create the Final HashAggregateExec, it is not finding the expected schema in the input operator. The partial exec outputs field names ending with "[sum]" and "[count]" and so on but the final aggregate doesn't seem to be looking for those names.

      It is also worth noting that the Final and Partial executors are not connected directly in this usage.

      The Partial exec is executed and output streamed to disk.

      The Final exec then runs against the output from the Partial exec.

      We may need to make changes in DataFusion to allow other crates to support this kind of use case?

       [1] https://github.com/ballista-compute/ballista/pull/491

       

      Attachments

        Issue Links

          Activity

            People

              andygrove Andy Grove
              andygrove Andy Grove
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m