Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
I did a quick investigation of the performance of WinUtils in YARN. In average NM calls 4.76 times per second and 65.51 per container.
Requests | Requests/sec | Requests/min | Requests/container | |
Sum [WinUtils] | 135354 | 4.761 | 286.160 | 65.51 |
[WinUtils] Execute -help | 4148 | 0.145 | 8.769 | 2.007 |
[WinUtils] Execute -ls | 2842 | 0.0999 | 6.008 | 1.37 |
[WinUtils] Execute -systeminfo | 9153 | 0.321 | 19.35 | 4.43 |
[WinUtils] Execute -symlink | 115096 | 4.048 | 243.33 | 57.37 |
[WinUtils] Execute -task isAlive | 4115 | 0.144 | 8.699 | 2.05 |
Interval: 7 hours, 53 minutes and 48 seconds
Each execution of WinUtils does around 140 IO ops, of which 130 are DDL ops.
This means 666.58 IO ops/second due to WinUtils.
We should start considering to remove WinUtils from Hadoop and creating a JNI interface.