Details
-
Task
-
Status: Open
-
Trivial
-
Resolution: Unresolved
-
None
-
None
-
None
-
{code:bash}
$ rustc --version
rustc 1.47.0-nightly (6c8927b0c 2020-07-26)
{code}
Cargo.toml:
{code:yaml}
[dependencies]
parquet = "0.17"
rayon = "1.1"
...
{code}
Description
I have a series of Parquet files that are 181 columns wide, and I'm processing them in parallel (using rayon). I ran into the OS limit (default 1024 according to ulimit -n) of open file descriptors when doing this, but each file was consuming 208 descriptors.
Is there a deterministic calculation for how many file descriptors will be used to process files so that one can determine appropriate multithreading in a situation like this?