[HBASE-22867] The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
Component/s: master
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
Replace the ForkJoinPool in CleanerChore by ThreadPoolExecutor which can limit the spawn thread size and avoid the master GC frequently. The replacement is an internal implementation in CleanerChore, so no config key change, the upstream users can just upgrade the hbase master without any other change.

Show
Replace the ForkJoinPool in CleanerChore by ThreadPoolExecutor which can limit the spawn thread size and avoid the master GC frequently. The replacement is an internal implementation in CleanerChore, so no config key change, the upstream users can just upgrade the hbase master without any other change.
Tags:
master

Description

The thousands of spawned threads make the safepoint cost 80+s in our Master JVM processs.

2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket connection and at
tempting reconnect

The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)

vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
32358.859: ForceAsyncSafepoint              [    9126         67            474    ]      [     1    28 86596    87   101    ]  0

Also we got the jstack:

$ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
8648

It's a dangerous bug, make it as blocker.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

191318.stack.1
22/Aug/19 02:31
773 kB
Zheng Hu
191318.stack
22/Aug/19 02:30
638 kB
Zheng Hu
31162.stack.1
16/Aug/19 10:32
14.84 MB
Zheng Hu

Issue Links

is caused by

HBASE-18309 Support multi threads in CleanerChore

Resolved

is related to

HBASE-22912 [Backport] HBASE-22867 to branch-1 to avoid ForkJoinPool to spawn thousands of threads

Resolved

relates to

HBASE-22871 Move the DirScanPool out and do not use static field

Resolved

links to

GitHub Pull Request #513

Activity

People

Assignee:: Zheng Hu

Reporter:: Zheng Hu

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 16/Aug/19 10:28

Updated:: 05/Sep/19 11:51

Resolved:: 26/Aug/19 02:04