Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.0.1
Description
When I call arrow::SetCpuThreadPoolCapacity(1);, Arrow does this:
1. Spins up a singleton ThreadPool with the default thread count;
2. Sets the number of threads on that ThreadPool to 1 – killing the extra threads.
On my Intel system, I'm forced to spin up four threads to set the CPU thread-pool capacity to 1. This goes against the spirit of the API method – or at least, my understanding of it (and my experience with other thread pools).
My workaround, for calling code: instead of calling arrow::SetCpuThreadPoolCapacity(1), call setenv("OMP_NUM_THREADS", "1", 1).
Brainstorming, here are some ideas for Arrow's global thread pool that would stop launching >limit threads to set the limit:
- cpu_thread_pool_capacity could be a global variable, not an attribute on the global ThreadPool. API users would be expected to set the thread-pool capacity before creating the thread pool. (They're probably doing this anyway.)
- SetCpuThreadPoolCapacity() could call setenv("OMP_NUM_THREADS", ...)
- ThreadPool could create threads on-demand instead of in the ctor. An unused ThreadPool would launch zero threads – resolving ARROW-10033 as a side-effect
Attachments
Issue Links
- supercedes
-
ARROW-4633 [Python] ParquetFile.read(use_threads=False) creates ThreadPool anyway
- Closed
- links to