Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
We saw a lot of idle threads during a test for EC reconstruct read:
14 DNs, write 10 10G files(EC: 10+4) with 10 threads using ockg:
./bin/ozone freon ockg -p test -n 10 -t 10 -s $((10*1024*1024*1024))
Then kill 4 DNs, and use ockv to validate read:
./bin/ozone freon ockv -p test -n 10 -t 10
And we found that threads for ec reconstruct read grows beyond 1000 as we proceed:
1024 ec-reader-for-conID: 3 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.014 false false 1025 ec-reader-for-conID: 3 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.015 false false 1026 ec-reader-for-conID: 3 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.017 false false 1027 ec-reader-for-conID: 3 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.015 false false 1028 ec-reader-for-conID: 3 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.014 false false 1057 ec-reader-for-conID: 4 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.041 false false 1059 ec-reader-for-conID: 4 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.014 false false 1060 ec-reader-for-conID: 4 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.015 false false 1061 ec-reader-for-conID: 4 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.014 false false 1062 ec-reader-for-conID: 4 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.014 false false 1063 ec-reader-for-conID: 4 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.015 false false 1064 ec-reader-for-conID: 4 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.011 false false 1065 ec-reader-for-conID: 4 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.020 false false 1066 ec-reader-for-conID: 4 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.013 false false 1067 ec-reader-for-conID: 4 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.015 false false 1068 ec-reader-for-conID: 4 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.013 false false 1069 ec-reader-for-conID: 5 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.024 false false 1070 ec-reader-for-conID: 5 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.032 false false 1071 ec-reader-for-conID: 5 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.020 false false 1072 ec-reader-for-conID: 5 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.022 false false 1073 ec-reader-for-conID: 5 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.021 false false 1074 ec-reader-for-conID: 5 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.018 false false 1075 ec-reader-for-conID: 5 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.020 false false 1076 ec-reader-for-conID: 5 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.025 false false 1077 ec-reader-for-conID: 5 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.019 false false 1078 ec-reader-for-conID: 5 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.027 false false 1089 ec-reader-for-conID: 3 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.018 false false 1090 ec-reader-for-conID: 3 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.019 false false 1092 ec-reader-for-conID: 3 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.020 false false 1095 ec-reader-for-conID: 3 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.016 false false 1096 ec-reader-for-conID: 3 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.016 false false 1098 ec-reader-for-conID: 3 locID: 10961100 main 5 WAITING 0.0 0.000 0:0.016 false false
For now, one thread pool of size 10 is created for each block group with EC 10+4, and the pool is shutdown when the InputStream for the block group is closed. But as I read the code, the BlockExtendedInputStream for each block is only closed altogether when the KeyInputStream is closed(This design may be intended to support seek backward during read).
So for a big file of 10G, we would have a lot of idle threads in WAITING state, this does not scale well for concurrent reconstruct read, and I think even a key level thread pool doesn't scale.
So we could have a client global thread pool for EC reconstruct read, then the number of threads will be under control, and the pool size could be configurable to fit all kinds of loads.
Attachments
Issue Links
- relates to
-
HDDS-6424 KeyInputStream should create BlockStreams lazily
- Open
- links to