Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
Description
DataFusion currently spawns one thread per partition and this results in poor performance if there are more partitions than available cores/threads. It would be better to have a thread-pool that defaults to number of available cores.
Here is a Google doc where we can collaborate on a design discussion.
https://docs.google.com/document/d/1_wc6diy3YrRgEIhVIGzrO5AK8yhwfjWlmKtGnvbsrrY/edit?usp=sharing