Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.7.0
-
None
-
Reviewed
Description
HistoryFileManager.scanIntermediateDirectory() in JHS acquires a lock on each user directory it tries to scan (move or delete files under the user directory as necessary). This method is called in a thread in JobHistory that performs periodical scanning of intermediate directory, and can also be called by web server threads for each Web API call made by a JHS client. In cases where there are many concurrent Web API calls/connections to JHS, all but one thread are blocked on the lock on the user directory. Eventually, client connects will time out, but the threads in JHS will not be killed and leave a lot of TCP connections in CLOSE_WAIT state.
[systest@vb1120 ~]$ sudo netstat -nap | grep 63729 | sort -k 4 tcp 0 0 10.17.202.19:10020 0.0.0.0:* LISTEN 63729/java tcp 0 0 10.17.202.19:10020 10.17.198.30:33010 ESTABLISHED 63729/java tcp 0 0 10.17.202.19:10020 10.17.200.30:33980 ESTABLISHED 63729/java tcp 0 0 10.17.202.19:10020 10.17.202.10:59625 ESTABLISHED 63729/java tcp 0 0 10.17.202.19:10020 10.17.202.13:35765 ESTABLISHED 63729/java tcp 0 0 10.17.202.19:10033 0.0.0.0:* LISTEN 63729/java tcp 0 0 10.17.202.19:19888 0.0.0.0:* LISTEN 63729/java tcp 0 0 10.17.202.19:19888 10.17.198.30:35103 ESTABLISHED 63729/java tcp 277 0 10.17.202.19:19888 10.17.198.30:43670 ESTABLISHED 63729/java tcp 0 0 10.17.202.19:19888 10.17.198.30:45453 ESTABLISHED 63729/java tcp 277 0 10.17.202.19:19888 10.17.198.30:49184 ESTABLISHED 63729/java tcp 1 0 10.17.202.19:19888 10.17.202.13:49992 CLOSE_WAIT 63729/java tcp 261 0 10.17.202.19:19888 10.17.202.13:52703 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52707 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52708 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52710 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52714 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52723 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52726 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52727 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52739 CLOSE_WAIT 63729/java tcp 261 0 10.17.202.19:19888 10.17.202.13:52749 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52753 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52757 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52760 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52820 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52827 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52829 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52831 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52833 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52836 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52839 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52841 CLOSE_WAIT 63729/java tcp 261 0 10.17.202.19:19888 10.17.202.13:52843 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52850 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52860 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52876 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52879 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52881 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52884 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52886 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52888 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52891 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52893 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52896 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52898 CLOSE_WAIT 63729/java tcp 261 0 10.17.202.19:19888 10.17.202.13:52899 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52902 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52909 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52910 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52912 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52923 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52925 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52927 CLOSE_WAIT 63729/java tcp 261 0 10.17.202.19:19888 10.17.202.13:52930 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52937 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52939 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52945 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52947 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52969 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:52972 CLOSE_WAIT 63729/java tcp 261 0 10.17.202.19:19888 10.17.202.13:52975 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53004 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53007 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53009 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53011 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53052 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53058 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53059 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53063 CLOSE_WAIT 63729/java tcp 261 0 10.17.202.19:19888 10.17.202.13:53071 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53084 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53093 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53095 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53097 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53101 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53104 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53106 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53108 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53110 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53112 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53114 CLOSE_WAIT 63729/java tcp 261 0 10.17.202.19:19888 10.17.202.13:53115 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53117 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53121 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53123 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53125 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53127 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53129 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53131 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53134 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53138 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53140 CLOSE_WAIT 63729/java tcp 261 0 10.17.202.19:19888 10.17.202.13:53153 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53155 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53157 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53159 CLOSE_WAIT 63729/java tcp 261 0 10.17.202.19:19888 10.17.202.13:53173 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53176 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53177 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53178 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53179 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53181 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53183 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53201 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53204 CLOSE_WAIT 63729/java tcp 261 0 10.17.202.19:19888 10.17.202.13:53218 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53267 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53270 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53275 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53278 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53280 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53283 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53293 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53296 CLOSE_WAIT 63729/java tcp 261 0 10.17.202.19:19888 10.17.202.13:53299 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53309 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53312 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53314 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53317 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53320 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53322 CLOSE_WAIT 63729/java tcp 256 0 10.17.202.19:19888 10.17.202.13:53338 CLOSE_WAIT 63729/java tcp 261 0 10.17.202.19:19888 10.17.202.13:53340 CLOSE_WAIT 63729/java tcp 255 0 10.17.202.19:19888 10.17.202.13:53364 ESTABLISHED 63729/java tcp 255 0 10.17.202.19:19888 10.17.202.13:53366 ESTABLISHED 63729/java tcp 260 0 10.17.202.19:19888 10.17.202.13:53367 ESTABLISHED 63729/java tcp 255 0 10.17.202.19:19888 10.17.202.13:53380 ESTABLISHED 63729/java tcp 255 0 10.17.202.19:19888 10.17.202.13:53382 ESTABLISHED 63729/java tcp 255 0 10.17.202.19:19888 10.17.202.13:53386 ESTABLISHED 63729/java tcp 255 0 10.17.202.19:19888 10.17.202.13:53390 ESTABLISHED 63729/java tcp 255 0 10.17.202.19:19888 10.17.202.13:53392 ESTABLISHED 63729/java tcp 1278 0 10.17.202.19:19888 10.17.202.18:45301 CLOSE_WAIT 63729/java tcp 1278 0 10.17.202.19:19888 10.17.202.18:45303 CLOSE_WAIT 63729/java tcp 1277 0 10.17.202.19:19888 10.17.202.18:45306 ESTABLISHED 63729/java
Attachments
Attachments
Issue Links
- is related to
-
MAPREDUCE-6797 Job history server scans can become blocked on a single, slow entry
- Resolved
-
MAPREDUCE-6698 Increase timeout on TestUnnecessaryBlockingOnHistoryFileInfo.testTwoThreadsQueryingDifferentJobOfSameUser
- Resolved
- relates to
-
MAPREDUCE-6573 Reduce the time of calling scanIntermediateDirectory in getFileInfo
- Open
-
MAPREDUCE-6436 JobHistory cache issue
- Closed