[ARROW-10038] [C++] SetCpuThreadPoolCapacity(1) spins up nCPUs threads - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.0.1
Fix Version/s: 4.0.0
Component/s: C++
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/26060

Description

When I call arrow::SetCpuThreadPoolCapacity(1);, Arrow does this:

1. Spins up a singleton ThreadPool with the default thread count;
2. Sets the number of threads on that ThreadPool to 1 – killing the extra threads.

On my Intel system, I'm forced to spin up four threads to set the CPU thread-pool capacity to 1. This goes against the spirit of the API method – or at least, my understanding of it (and my experience with other thread pools).

My workaround, for calling code: instead of calling arrow::SetCpuThreadPoolCapacity(1), call setenv("OMP_NUM_THREADS", "1", 1).

Brainstorming, here are some ideas for Arrow's global thread pool that would stop launching >limit threads to set the limit:

cpu_thread_pool_capacity could be a global variable, not an attribute on the global ThreadPool. API users would be expected to set the thread-pool capacity before creating the thread pool. (They're probably doing this anyway.)
SetCpuThreadPoolCapacity() could call setenv("OMP_NUM_THREADS", ...)
ThreadPool could create threads on-demand instead of in the ctor. An unused ThreadPool would launch zero threads – resolving ARROW-10033 as a side-effect

Attachments

Issue Links

supercedes

ARROW-4633 [Python] ParquetFile.read(use_threads=False) creates ThreadPool anyway

Closed

links to

GitHub Pull Request #8240

Activity

People

Assignee:: Antoine Pitrou

Reporter:: Adam Hooper

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/Sep/20 14:04

Updated:: 11/Jan/23 08:10

Resolved:: 26/Jan/21 18:33

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

3h 20m