Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
7.0.0
-
None
Description
Mentioned here:
https://github.com/apache/arrow/pull/11274#pullrequestreview-768267959
For example, a top-k implementation could periodically (when batches_ has some configurable # of rows) run through and discard data. The way it is written now it would still require me to buffer the entire dataset in memory (and/or spillover).
Attachments
Issue Links
- is depended upon by
-
ARROW-14254 [C++] Return a random sample of rows from a query
-
- Open
-
- is duplicated by
-
ARROW-14201 RAM-efficient topk sink node
-
- Closed
-