[HDFS-14277] [SBN read] Observer benchmark results - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.10.0, 3.3.0
Fix Version/s: None
Component/s: ha, namenode
Labels:
None
Environment:

Hardware: 4-node cluster, each node has 4 core, Xeon 2.5Ghz, 25GB memory.
Software: CentOS 7.4, CDH 6.0 + Consistent Reads from Standby, Kerberos, SSL, RPC encryption + Data Transfer Encryption, Cloudera Navigator.

Target Version/s:

2.10.3

Description

Ran a few benchmarks and profiler (VisualVM) today on an Observer-enabled cluster. Would like to share the results with the community. The cluster has 1 Observer node.

NNThroughputBenchmark

Generate 1 million files and send fileStatus RPCs.

hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs <namenode>  -op fileStatus -threads 100 -files 1000000 -useExisting -keepResults

Kerberos, SSL, RPC encryption, Data Transfer Encryption enabled:

Node	fileStatus (Ops per sec)
Active NameNode	4865
Observer	3996

Kerberos, SSL:

Node	fileStatus (Ops per sec)
Active NameNode	7078
Observer	6459

Observation:

due to the edit tailing overhead, Observer node consume 30% CPU utilization even if the cluster is idle.
While Active NN has less than 1ms RPC processing time, Observer node has > 5ms RPC processing time. I am still looking for the source of the longer processing time. The longer RPC processing time may be the cause for the performance degradation compared to that of Active NN. Note the cluster has Cloudera Navigator installed which adds additional overhead to RPC processing time.
GlobalStateIdContext#isCoordinatedCall() pops up as one of the top hotspots in the profiler.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Screen Shot 2019-02-14 at 11.50.37 AM.png
14/Feb/19 20:16
399 kB
Wei-Chiu Chuang
observer RPC queue processing time.png
14/Feb/19 01:54
163 kB
Wei-Chiu Chuang
Observer profiler.png
14/Feb/19 19:21
457 kB
Wei-Chiu Chuang

Issue Links

is related to

HDFS-14276 [SBN read] Reduce tailing overhead

Resolved

relates to

HDFS-14822 [SBN read] Revisit GlobalStateIdContext locking when getting server state id

Resolved

HDFS-14858 [SBN read] Allow configurably enable/disable AlignmentContext on NameNode

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Wei-Chiu Chuang

Votes:: 0 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 14/Feb/19 03:12

Updated:: 24/May/22 06:16