Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.10.0, 3.3.0
-
None
-
None
-
Hardware: 4-node cluster, each node has 4 core, Xeon 2.5Ghz, 25GB memory.
Software: CentOS 7.4, CDH 6.0 + Consistent Reads from Standby, Kerberos, SSL, RPC encryption + Data Transfer Encryption, Cloudera Navigator.
Description
Ran a few benchmarks and profiler (VisualVM) today on an Observer-enabled cluster. Would like to share the results with the community. The cluster has 1 Observer node.
NNThroughputBenchmark
Generate 1 million files and send fileStatus RPCs.
hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs <namenode> -op fileStatus -threads 100 -files 1000000 -useExisting -keepResults
Kerberos, SSL, RPC encryption, Data Transfer Encryption enabled:
Node | fileStatus (Ops per sec) |
---|---|
Active NameNode | 4865 |
Observer | 3996 |
Kerberos, SSL:
Node | fileStatus (Ops per sec) |
---|---|
Active NameNode | 7078 |
Observer | 6459 |
Observation:
- due to the edit tailing overhead, Observer node consume 30% CPU utilization even if the cluster is idle.
- While Active NN has less than 1ms RPC processing time, Observer node has > 5ms RPC processing time. I am still looking for the source of the longer processing time. The longer RPC processing time may be the cause for the performance degradation compared to that of Active NN. Note the cluster has Cloudera Navigator installed which adds additional overhead to RPC processing time.
- GlobalStateIdContext#isCoordinatedCall() pops up as one of the top hotspots in the profiler.
Attachments
Attachments
Issue Links
- is related to
-
HDFS-14276 [SBN read] Reduce tailing overhead
- Resolved
- relates to
-
HDFS-14822 [SBN read] Revisit GlobalStateIdContext locking when getting server state id
- Resolved
-
HDFS-14858 [SBN read] Allow configurably enable/disable AlignmentContext on NameNode
- Resolved