[HADOOP-13263] Reload cached groups in background after expiry - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.8.0, 3.0.0-alpha1, 2.7.6
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
hadoop.security.groups.cache.background.reload can be set to true to enable background reload of expired groups cache entries. This setting can improve the performance of services that use Groups.java (e.g. the NameNode) when group lookups are slow. The setting is disabled by default.

Show
hadoop.security.groups.cache.background.reload can be set to true to enable background reload of expired groups cache entries. This setting can improve the performance of services that use Groups.java (e.g. the NameNode) when group lookups are slow. The setting is disabled by default.

Description

In ~~HADOOP-11238~~ the Guava cache was introduced to allow refreshes on the Namenode group cache to run in the background, avoiding many slow group lookups. Even with this change, I have seen quite a few clusters with issues due to slow group lookups. The problem is most prevalent in HA clusters, where a slow group lookup on the hdfs user can fail to return for over 45 seconds causing the Failover Controller to kill it.

The way the current Guava cache implementation works is approximately:

1) On initial load, the first thread to request groups for a given user blocks until it returns. Any subsequent threads requesting that user block until that first thread populates the cache.

2) When the key expires, the first thread to hit the cache after expiry blocks. While it is blocked, other threads will return the old value.

I feel it is this blocking thread that still gives the Namenode issues on slow group lookups. If the call from the FC is the one that blocks and lookups are slow, if can cause the NN to be killed.

Guava has the ability to refresh expired keys completely in the background, where the first thread that hits an expired key schedules a background cache reload, but still returns the old value. Then the cache is eventually updated. This patch introduces this background reload feature. There are two new parameters:

1) hadoop.security.groups.cache.background.reload - default false to keep the current behaviour. Set to true to enable a small thread pool and background refresh for expired keys

2) hadoop.security.groups.cache.background.reload.threads - only relevant if the above is set to true. Controls how many threads are in the background refresh pool. Default is 1, which is likely to be enough.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-13263.001.patch
13/Jun/16 21:45
14 kB
Stephen O'Donnell
HADOOP-13263.002.patch
16/Jun/16 21:30
18 kB
Stephen O'Donnell
HADOOP-13263.003.patch
17/Jun/16 21:25
20 kB
Stephen O'Donnell
HADOOP-13263.004.patch
20/Jun/16 20:55
20 kB
Stephen O'Donnell
HADOOP-13263.005.patch
21/Jun/16 22:07
20 kB
Stephen O'Donnell
HADOOP-13263.006.patch
22/Jun/16 11:30
20 kB
Stephen O'Donnell
HADOOP-13263.007.patch
25/Jun/16 19:55
24 kB
Stephen O'Donnell

Issue Links

breaks

HADOOP-13375 o.a.h.security.TestGroupsCaching.testBackgroundRefreshCounters seems flaky

Resolved

is related to

HADOOP-15614 TestGroupsCaching.testExceptionOnBackgroundRefreshHandled reliably fails

Resolved

Activity

People

Assignee:: Stephen O'Donnell

Reporter:: Stephen O'Donnell

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 13/Jun/16 21:12

Updated:: 02/Oct/19 17:14

Resolved:: 27/Jun/16 16:55