[HADOOP-2060] DFSClient should choose a block that is local to the node where the client is running - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Invalid
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

When I chase down the DFSClient code to see how the data locality impact the dfs read throughput,
I realized that DFSClient does not use data locality info (at least not obvious to me)
when it chooses a block for read from the available replicas.
Here is the relevant code:

 /**
   * Pick the best node from which to stream the data.
   * Entries in <i>nodes</i> are already in the priority order
   */
  private DatanodeInfo bestNode(DatanodeInfo nodes[], 
                                AbstractMap<DatanodeInfo, DatanodeInfo> deadNodes)
                                throws IOException {
    if (nodes != null) { 
      for (int i = 0; i < nodes.length; i++) {
        if (!deadNodes.containsKey(nodes[i])) {
          return nodes[i];
        }
      }
    }
    throw new IOException("No live nodes contain current block");
  }

    private DNAddrPair chooseDataNode(LocatedBlock block)
      throws IOException {
      int failures = 0;
      while (true) {
        DatanodeInfo[] nodes = block.getLocations();
        try {
          DatanodeInfo chosenNode = bestNode(nodes, deadNodes);
          InetSocketAddress targetAddr = DataNode.createSocketAddr(chosenNode.getName());
          return new DNAddrPair(chosenNode, targetAddr);
        } catch (IOException ie) {
          String blockInfo = block.getBlock() + " file=" + src;
          if (failures >= MAX_BLOCK_ACQUIRE_FAILURES) {
            throw new IOException("Could not obtain block: " + blockInfo);
          }
          
          if (nodes == null || nodes.length == 0) {
            LOG.info("No node available for block: " + blockInfo);
          }
          LOG.info("Could not obtain block " + block.getBlock() + " from any node:  " + ie);
          try {
            Thread.sleep(3000);
          } catch (InterruptedException iex) {
          }
          deadNodes.clear(); //2nd option is to remove only nodes[blockId]
          openInfo();
          failures++;
          continue;
        }
      }
    }

It seems to pick the first good replica.
This means that even though some replica is local to the node where the client runs,
it may actually pick a remote replica.

Map/reduce tries to schedule a mapper to a node where some copy of the input split data is local to the node.
However, if the DFSClient does not use that info in choosing replica for read, the mapper may well have to read data
from the network, even though a local replica is available.

I hope I missed something and misunderstood the code.
Otherwise, this will be a serious problem to performance.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Runping Qi

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 15/Oct/07 19:29

Updated:: 08/Jul/09 16:42

Resolved:: 02/Nov/07 13:41