Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3673

Deadlock in Datanode RPC servers

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.0
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      There is a deadlock scenario in the way Lease Recovery is triggered using the Datanode RPC server via HADOOP-3310.

      Each Datanode has dfs.datanode.handler.count handler threads (default of 3). These handler threads are used to support the generation-stamp-dance protocol as described in HADOOP-1700.

      Let me try to explain the scenario with an example. Suppose, a cluster has two datanodes. Also, let's assume that dfs.datanode.handler.count is set to 1. Suppose that there are two clients, each writing to a separate file with a replication factor of 2. Let's assume that both clients encounter an IO error and triggers the generation-stamp-dance protocol. The first client may invoke recoverBlock on the first datanode while the second client may invoke recoverBlock on the second datanode. Now, each of the datanode will try to make a getBlockMetaDataInfo() to the other datanode. But since each datanode has only 1 server handler threads, both threads will block for eternity. Deadlock!

        Attachments

        1. 3673_20080707b_0.18.patch
          10 kB
          Tsz-wo Sze
        2. 3673_20080707b.patch
          10 kB
          Tsz-wo Sze
        3. 3673_20080707.patch
          9 kB
          Tsz-wo Sze
        4. 3673_20080702e.patch
          9 kB
          Tsz-wo Sze
        5. 3673_20080702d.patch
          9 kB
          Tsz-wo Sze
        6. 3673_20080702c.patch
          7 kB
          Tsz-wo Sze
        7. 3673_20080702b.patch
          5 kB
          Tsz-wo Sze
        8. 3673_20080702.patch
          4 kB
          Tsz-wo Sze

          Issue Links

            Activity

              People

              • Assignee:
                szetszwo Tsz-wo Sze
                Reporter:
                dhruba Dhruba Borthakur
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: