[HDFS-11661] GetContentSummary uses excessive amounts of memory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 2.8.0, 3.0.0-alpha2
Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
Component/s: namenode
Labels:
None

Target Version/s:

3.0.0-alpha4, 2.8.1
Hadoop Flags:

Reviewed
Release Note:

Hide
Reverted ~~HDFS-10797~~ to fix a scalability regression brought by the commit.

Show
Reverted HDFS-10797 to fix a scalability regression brought by the commit.

Description

ContentSummaryComputationContext::nodeIncluded() is being used to keep track of all INodes visited during the current content summary calculation. This can be all of the INodes in the filesystem, making for a VERY large hash table. This simply won't work on large filesystems.

We noticed this after upgrading a namenode with ~100Million filesystem objects was spending significantly more time in GC. Fortunately this system had some memory breathing room, other clusters we have will not run with this additional demand on memory.

This was added as part of ~~HDFS-10797~~ as a way of keeping track of INodes that have already been accounted for - to avoid double counting.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-11661.001.patch
26/Apr/17 18:52
10 kB
Wei-Chiu Chuang
HDFs-11661.002.patch
28/Apr/17 20:19
10 kB
Wei-Chiu Chuang
Heap growth.png
18/Apr/17 13:50
184 kB
Daryn Sharp

Issue Links

is broken by

HDFS-10797 Disk usage summary of snapshots causes renamed blocks to get counted twice

Resolved

relates to

HDFS-11515 -du throws ConcurrentModificationException

Resolved

Activity

People

Assignee:: Wei-Chiu Chuang

Reporter:: Nathan Roberts

Votes:: 0 Vote for this issue

Watchers:: 19 Start watching this issue

Dates

Created:: 17/Apr/17 21:23

Updated:: 25/Oct/19 20:24

Resolved:: 25/May/17 01:23