Hadoop Common
  1. Hadoop Common
  2. HADOOP-1269

DFS Scalability: namenode throughput impacted becuase of global FSNamesystem lock

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
      None

      Description

      I have been running a 2000 node cluster and measuring namenode performance. There are quite a few "Calls dropped" messages in the namenode log. The namenode machine has 4 CPUs and each CPU is about 30% busy. Profiling the namenode shows that the methods the consume CPU the most are addStoredBlock() and getAdditionalBlock(). The first method in invoked when a datanode confirms the presence of a newly created block. The second method in invoked when a DFSClient request a new block for a file.

      I am attaching two files that were generated by the profiler. serverThreads40.html captures the scenario when the namenode had 40 server handler threads. serverThreads1.html is with 1 server handler thread (with a max_queue_size of 4000).

      In the case when there are 40 handler threads, the total elapsed time taken by FSNamesystem.getAdditionalBlock() is 1957 seconds whereas the methods that that it invokes (chooseTarget) takes only about 97 seconds. FSNamesystem.getAdditionalBlock is blocked on the global FSNamesystem lock for all those 1860 seconds.

      My proposal is to implement a finer grain locking model in the namenode. The FSNamesystem has a few important data structures, e.g. blocksMap, datanodeMap, leases, neededReplication, pendingCreates, heartbeats, etc. Many of these data structures already have their own lock. My proposal is to have a lock for each one of these data structures. The individual lock will protect the integrity of the contents of the data structure that it protects. The global FSNamesystem lock is still needed to maintain consistency across different data structures.

      If we implement the above proposal, both addStoredBlock() and getAdditionalBlock() does not need to hold the global FSNamesystem lock. startFile() and closeFile() still needs to acquire the global FSNamesystem lock because it needs to ensure consistency across multiple data structures.

      1. chooseTargetLock2.patch
        30 kB
        dhruba borthakur
      2. serverThreads1.html
        35 kB
        dhruba borthakur
      3. serverThreads40.html
        34 kB
        dhruba borthakur

        Activity

        dhruba borthakur created issue -
        dhruba borthakur made changes -
        Field Original Value New Value
        Attachment serverThreads1.html [ 12355786 ]
        Attachment serverThreads40.html [ 12355785 ]
        dhruba borthakur made changes -
        Assignee dhruba borthakur [ dhruba ]
        dhruba borthakur made changes -
        Summary DFS Scalability: namenode throughout impacted becuase of global FSNamesystem lock DFS Scalability: namenode throughput impacted becuase of global FSNamesystem lock
        dhruba borthakur made changes -
        Attachment chooseTargetLock.patch [ 12358289 ]
        dhruba borthakur made changes -
        Attachment chooseTargetLock2.patch [ 12358925 ]
        dhruba borthakur made changes -
        Attachment chooseTargetLock.patch [ 12358289 ]
        dhruba borthakur made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Doug Cutting made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Fix Version/s 0.14.0 [ 12312474 ]
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Owen O'Malley made changes -
        Component/s dfs [ 12310710 ]

          People

          • Assignee:
            dhruba borthakur
            Reporter:
            dhruba borthakur
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development