Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0, 3.1.4
Description
There is a kernel overhead on datanode upgrade. If datanode with millions of blocks and 10+ disks then block-layout migration will be super expensive during its hardlink operation. Slowness is observed when running with large hardlink threads(dfs.datanode.block.id.layout.upgrade.threads, default is 12 thread for each disk) and its runs for 2+ hours.
I.e 10*12=120 threads (for 10 disks)
Small test:
RHEL7, 32 cores, 20 GB RAM, 8 GB DN heap
dfs.datanode.block.id.layout.upgrade.threads | Blocks | Disks | Time taken |
---|---|---|---|
12 | 3.3 Million | 1 | 2 minutes and 59 seconds |
6 | 3.3 Million | 1 | 2 minutes and 35 seconds |
3 | 3.3 Million | 1 | 2 minutes and 51 seconds |
Tried same test twice and 95% is accurate (only a few sec difference on each iteration). Using 6 thread is faster than 12 thread because of its overhead.
Attachments
Issue Links
- is duplicated by
-
HDFS-9536 OOM errors during parallel upgrade to Block-ID based layout
- Resolved
- links to