[HDFS-11881] NameNode consumes a lot of memory for snapshot diff report generation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0-alpha1
Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
Component/s: hdfs, snapshots
Labels:
None

Target Version/s:

3.0.0-alpha4
Hadoop Flags:

Reviewed

Description

Problem:
HDFS supports a snapshot diff tool which can generate a detailed report of modified, created, deleted and renamed files between any 2 snapshots.

hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>

However, if the diff list between 2 snapshots happens to be huge, in the order of millions, then NameNode can consume a lot of memory while generating the huge diff report. In a few cases, we are seeing NameNode getting into a long GC lasting for few minutes to make room for this burst in memory requirement during snapshot diff report generation.

RootCause:

NameNode tries to generate the diff report with all diff entries at once which puts undue pressure
Each diff report entry has the diff type (enum), source path byte array, and destination path byte array to the minimum. Let's take file deletions use case. For file deletions, there would be only source or destination paths in the diff report entry. Let's assume these deleted files on average take 128Bytes for the path. 4 million file deletion captured in diff report will thus need 512MB of memory
The snapshot diff report uses simple java ArrayList which tries to double its backing contiguous memory chunk every time the usage factor crosses the capacity threshold. So, a 512MB memory requirement might be internally asking for a much larger contiguous memory chunk

Proposal:

Make NameNode snapshot diff report service follow the batch model (like directory listing service). Clients (hdfs snapshotDiff command) will then receive diff report in small batches, and need to iterate several times to get the full list.
Additionally, snap diff report service in the NameNode can make use of ChunkedArrayList data structure instead of the current ArrayList so as to avoid the curse of fragmentation and large contiguous memory requirement.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

1_ChunkedArrayList_SnapshotDiffReport.png
24/Jun/17 02:33
148 kB
Manoj Govindassamy
2_ArrayList_SnapshotDiffReport.png
24/Jun/17 02:33
146 kB
Manoj Govindassamy
HDFS-11881.01.patch
13/Jun/17 00:50
7 kB
Manoj Govindassamy

Issue Links

relates to

HDFS-12042 Lazy initialize AbstractINodeDiffList#diffs for snapshots to reduce memory consumption

Resolved

Activity

People

Assignee:: Manoj Govindassamy

Reporter:: Manoj Govindassamy

Votes:: 1 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 24/May/17 19:41

Updated:: 02/Oct/19 17:15

Resolved:: 29/Jun/17 13:43