Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.0.0
-
None
Description
During the benchmark of Spark 2.4.0 on HPC (High Performance Computing), we identified an area can be optimized to improve RPC performance on large number of HPC nodes with omini-path NIC. It's same thread configurations for both driver and executor. From the test, we find driver and executor should have different thread configurations because driver has far more RPC messages than single executor.
These configurations are,
Config Key | for Driver | for Executor |
---|---|---|
spark.rpc.io.serverThreads | spark.driver.rpc.io.serverThreads | spark.executor.rpc.io.serverThreads |
spark.rpc.io.clientThreads | spark.driver.rpc.io.clientThreads | spark.executor.rpc.io.clientThreads |
spark.rpc.netty.dispatcher.numThreads | spark.driver.rpc.netty.dispatcher.numThreads | spark.executor.rpc.netty.dispatcher.numThreads |
When Spark reads thread configurations, it tries to read driver's configurations or executor's configurations first. Then fall back to the common thread configurations.
After the separation, the performance is improved a lot in 256 nodes and 512 nodes. see below test result of SimpleMapTask.
spark.driver.rpc.io.serverThreads | spark.driver.rpc.io.clientThreads | spark.driver.rpc.netty.dispatcher.numThreads | spark.executor.rpc.netty.dispatcher.numThreads | Overall Time (s) | Overall Time without Separation (s) | Improvement | |
---|---|---|---|---|---|---|---|
128 nodes | 15 | 15 | 10 | 30 | 107 | 108 | 0.9% |
256 nodes | 12 | 15 | 10 | 30 | 159 | 196 | 18.8% |
512 nodes | 12 | 15 | 10 | 30 | 283 | 377 | 24.9% |
The implementation is almost done. We are working on the code merge.
Attachments
Issue Links
- links to