Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.4.0, 2.10.2, 3.3.2, 3.2.4
Description
In secure mode, 'dfs.block.access.token.enable' should be set 'true'. With this configuration SecretManager.InvalidToken exception may be thrown if the access token expires when we do short circuit reads. It doesn't matter because the failed reads will be retried. But it causes the leakage of ShortCircuitShm.Slot objects.
We found this problem in our secure HBase clusters. The number of open file descriptors of RegionServers kept increasing using short circuit reading.
It was caused by the leakage of shared memory segments used by short circuit reading.
[root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk '{print $2}') | grep /dev/shm | wc -l 3925 [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk '{print $2}') | grep /dev/shm | head -5 java 86309 hbase DEL REG 0,19 2308279984 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_743473959 java 86309 hbase DEL REG 0,19 2306359893 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_1594162967 java 86309 hbase DEL REG 0,19 2305496758 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_2043027439 java 86309 hbase DEL REG 0,19 2304784261 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_689571088 java 86309 hbase DEL REG 0,19 2302621988 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_347008590
We finally found that the root cause of this is the leakage of ShortCircuitShm.Slot.
The fix is trivial. Just free the slot when InvalidToken exception is thrown.
Attachments
Attachments
Issue Links
- links to