[HDFS-10203] Excessive topology lookup for large number of client machines in namenode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

In the ClientProtocol#getBlockLocations call, DatanodeManager computes the network distance between the client machine and the datanodes. As part of that, it needs to resolve the network location of the client machine. If the client machine isn't a datanode, it needs to ask DNSToSwitchMapping to resolve it.

  public void sortLocatedBlocks(final String targethost,
      final List<LocatedBlock> locatedblocks) {
    //sort the blocks
    // As it is possible for the separation of node manager and datanode, 
    // here we should get node but not datanode only .
    Node client = getDatanodeByHost(targethost);
    if (client == null) {
      List<String> hosts = new ArrayList<> (1);
      hosts.add(targethost);
      List<String> resolvedHosts = dnsToSwitchMapping.resolve(hosts);
      if (resolvedHosts != null && !resolvedHosts.isEmpty()) {
      ....
      }
    }
  }

When there are ten of thousands of non-datanode client machines hitting the namenode which uses ScriptBasedMapping, it causes the following issues:

After namenode startup, CachedDNSToSwitchMapping only has datanodes in the cache. Calls from many different client machines means lots of process fork to run the topology script and can cause spike in namenode load.
Cache size of CachedDNSToSwitchMapping can grow large. Under normal workload say < 100k client machines and each entry in the cache uses < 200 bytes, it will take up to 20MB, not much compared to NN's heap size. But in theory it can still blow up NN if there is misconfiguration or malicious attack with millions of IP addresses.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Ming Ma

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 24/Mar/16 05:52

Updated:: 30/Mar/16 06:01