[HDDS-4427] Avoid ContainerCache in ContainerReader at Datanode startup - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.1.0
Fix Version/s: 1.1.0
Component/s: Ozone Datanode
Labels:
- pull-request-available

Target Version/s:

1.1.0

Description

Testing on a dense datanode (200k containers, 45 disks) I see contention around the ContainerCache. Most of the time most threads are running in parallel, but we see slowdowns where most threads get blocked waiting on the ContainerCache lock.

Examining JStacks, we can see the runnable thread blocking others is typically evicting a RocksDB instance from the cache:

"Thread-37" #131 prio=5 os_prio=0 tid=0x00007f8f49219800 nid=0x1c5e9 runnable [0x00007f86f7e78000]
   java.lang.Thread.State: RUNNABLE
        at org.rocksdb.RocksDB.closeDatabase(Native Method)
        at org.rocksdb.RocksDB.close(RocksDB.java:468)
        at org.apache.hadoop.hdds.utils.RocksDBStore.close(RocksDBStore.java:389)
        at org.apache.hadoop.ozone.container.common.utils.ReferenceCountedDB.cleanup(ReferenceCountedDB.java:79)
        at org.apache.hadoop.ozone.container.common.utils.ContainerCache.removeLRU(ContainerCache.java:106)
        at org.apache.commons.collections.map.LRUMap.addMapping(LRUMap.java:242)
        at org.apache.commons.collections.map.AbstractHashedMap.put(AbstractHashedMap.java:284)
        at org.apache.hadoop.ozone.container.common.utils.ContainerCache.getDB(ContainerCache.java:167)
        at org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getDB(BlockUtils.java:63)
        at org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.parseKVContainerData(KeyValueContainerUtil.java:165)
        at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyAndFixupContainerData(ContainerReader.java:183)
        at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyContainerFile(ContainerReader.java:160)
        at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.readVolume(ContainerReader.java:137)
        at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.run(ContainerReader.java:91)
        at java.lang.Thread.run(Thread.java:748)

The slowness seems to be driven by the RocksDB close call. It is generally fast, but is often around 1ms. Eg, here are some timings from that call after adding instrumentation to the code:

grep -a "metric: closing DB took" ozone-datanode.log | cut -d ":" -f 6 | sort -n | uniq -c
61940 0
128155 1
2786 2
236 3
53 4
42 5
17 6
10 7
8 8
15 9

The timer was only at ms precision, so that is why many are zero. Even at 1ms per close, we can only close 1000 per second and this point of the code is serialized.

At startup time, there is no value in caching the open containers. All containers on the node need to be read in parallel, therefore we should simply open and close each container without caching the instance.

Attachments

Issue Links

links to

GitHub Pull Request #1549

Activity

People

Assignee:: Stephen O'Donnell

Reporter:: Stephen O'Donnell

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 03/Nov/20 12:56

Updated:: 19/Nov/20 11:37

Resolved:: 19/Nov/20 11:37