Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-814

Increase dfs scalability by optimizing locking on namenode.

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.10.0
    • None
    • None

    Description

      The current dfs namenode encounters locking bottlenecks when the number of datanodes is large. The namenode uses a single global lock to protect access to data structures. One key area is heartbeat processing. The lower the cost of processing a heartbeat, more the number of nodes HDFS can support. A simple change to this current locking model can increase the scalability. Here are the details:

      Case 1: Currently we have three locks, the global lock (on FSNamesystem), the heartbeat lock and the datanodeMap lock. The following function is called when a heartbeat is received by the Namenode

      public synchronized FSNamesystem. gotHeartbeat() { ........ (A)
      synchronized (heartbeat) { ........ (B)
      synchronized (datanodeMap)

      { ......... (C) ... }

      }

      In the above piece of code, statement (A) acquires the global-FSNamesystem-lock. This synchronization can be safely removed (remove updateStats too). This means that a heartbeat from the datanode can be processed without holding the FSnamesystem-global-lock.

      Case 2: A following thread called the heartbeatCheck thread periodically traverses all known Datanodes to determine if any of them has timed out. It is of the following form:

      void FSNamesystem.heartbeatCheck() {
      synchronized (this) { ........... (D)
      synchronized (heartbeats) { .............(E)
      }

      This thread acquires the global-FSNamesystem lock in Statement (D). This statement (D) can be removed. Instead the loop can check to see if any nodes are dead. If a dead node is found, only then it acquires the FSNamesystem-global-lock.

      It is possible that fixing the above two cases will cause HDFS to scale to higher number of nodes.

      Attachments

        1. heartbeatlock3.patch
          6 kB
          Dhruba Borthakur

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dhruba Dhruba Borthakur
            dhruba Dhruba Borthakur
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment