Hadoop Common
  1. Hadoop Common
  2. HADOOP-774

Datanodes fails to heartbeat when a directory with a large number of blocks is deleted

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None

      Description

      If a user removes a few files that are huge, it causes the namenode to send BlockInvalidate command to the relevant Datanodes. The Datanode process the blockInvalidate command as part of its heartbeat thread. If the number of blocks to be invalidated is huge, the datanode takes a long time to process it. This causes the datanode to not send new heartbeats to the namenode. The namenode declares the datanode as dead!

      1. One option is to process the blockInvalidate as a separate thread from the heartbeat thread in the Datanode.
      2. Another option would be to constrain the namenode to send a max (e.g. 500) blocks per blockInvalidate message.

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          I like (2), make the namenode limit the number of blocks per blockInvalidate message.

          Show
          Doug Cutting added a comment - I like (2), make the namenode limit the number of blocks per blockInvalidate message.
          Hide
          dhruba borthakur added a comment -

          1. Option 2 is simple to implement, very few code change.

          2. The namenode migth not have detailed knowledge about the computing resources of a Datanode. A Datanode that has a faster disk could process a blockinvalidate message faster than a datanode that has slower disk IO speeds. Thus it is best left to the datanodes to figure out the amount of computing power they want to spend on deleting blocks. This votes for Option 1.

          2, Option 1 seems to be scalable because the namenode has to do bookkeeping for invalidatedNodes for lesser time. The namenode, once its sends out the blockInvalidate command, can get rid of the blocks from the recentInvalidateSets map immediately. This might mean that this piece of memory can be reused earlier. If the namenode adopts Option 2, it might take a while before all the blocks get sent to the Datanode. The memory for these block objects will hang around longer.

          3. With Option1, the speed of blockinvalidation depends on the heartbeat frequency. Thus changing the heartbeat frequency might have an unintended side-effect on the block reclamation speed of the cluster.

          4. Option 2 has the disadvantage that the Datanode has to throttle the creation of new threads (and queue if needed) if too-many individual block-invalidate commands arrive within a short period of time. This code might be non-trivial.

          Given the above, I would go ahead and implement Option 2.

          Show
          dhruba borthakur added a comment - 1. Option 2 is simple to implement, very few code change. 2. The namenode migth not have detailed knowledge about the computing resources of a Datanode. A Datanode that has a faster disk could process a blockinvalidate message faster than a datanode that has slower disk IO speeds. Thus it is best left to the datanodes to figure out the amount of computing power they want to spend on deleting blocks. This votes for Option 1. 2, Option 1 seems to be scalable because the namenode has to do bookkeeping for invalidatedNodes for lesser time. The namenode, once its sends out the blockInvalidate command, can get rid of the blocks from the recentInvalidateSets map immediately. This might mean that this piece of memory can be reused earlier. If the namenode adopts Option 2, it might take a while before all the blocks get sent to the Datanode. The memory for these block objects will hang around longer. 3. With Option1, the speed of blockinvalidation depends on the heartbeat frequency. Thus changing the heartbeat frequency might have an unintended side-effect on the block reclamation speed of the cluster. 4. Option 2 has the disadvantage that the Datanode has to throttle the creation of new threads (and queue if needed) if too-many individual block-invalidate commands arrive within a short period of time. This code might be non-trivial. Given the above, I would go ahead and implement Option 2.
          Hide
          Raghu Angadi added a comment -

          I think (1) is simpler, from namenode point of view.

          One modification for (1) is that, Datanode should first delete the mapping inline and queue the physical file deletion on a separate thread. This is required so that Datanode does not send the already invalidated blocks in in its next heartbeat.

          Of course, datanode should not create one thread for each RPC call. It could delete them inline if the number is small (say < 20), otherwise queue them up to be deleted by a thread (It will create one if one does not exist). The thread exits if there are no files to be deleted.

          Show
          Raghu Angadi added a comment - I think (1) is simpler, from namenode point of view. One modification for (1) is that, Datanode should first delete the mapping inline and queue the physical file deletion on a separate thread. This is required so that Datanode does not send the already invalidated blocks in in its next heartbeat. Of course, datanode should not create one thread for each RPC call. It could delete them inline if the number is small (say < 20), otherwise queue them up to be deleted by a thread (It will create one if one does not exist). The thread exits if there are no files to be deleted.
          Hide
          Konstantin Shvachko added a comment -

          +1 for option 2.
          Making it conifgurable would be good too.
          This btw does not exclude option 1 in the future.

          Show
          Konstantin Shvachko added a comment - +1 for option 2. Making it conifgurable would be good too. This btw does not exclude option 1 in the future.
          Hide
          dhruba borthakur added a comment -

          Limit the maximum number of blocks that can be sent in a single blockInvalidate RPC.

          Show
          dhruba borthakur added a comment - Limit the maximum number of blocks that can be sent in a single blockInvalidate RPC.
          Hide
          dhruba borthakur added a comment -

          Limit the maximum number of blocks that can be sent in a single blockInvalidate RPC.

          Show
          dhruba borthakur added a comment - Limit the maximum number of blocks that can be sent in a single blockInvalidate RPC.
          Hide
          Hadoop QA added a comment -

          +1, http://issues.apache.org/jira/secure/attachment/12346376/chunkinvalidateBlocks.java applied and successfully tested against trunk revision 482405

          Show
          Hadoop QA added a comment - +1, http://issues.apache.org/jira/secure/attachment/12346376/chunkinvalidateBlocks.java applied and successfully tested against trunk revision 482405
          Hide
          dhruba borthakur added a comment -

          Incorporated review comments.

          Show
          dhruba borthakur added a comment - Incorporated review comments.
          Hide
          Doug Cutting added a comment -

          > Incorporated review comments

          I don't see any review comments here or on the dev list. Was there an offline review?

          Show
          Doug Cutting added a comment - > Incorporated review comments I don't see any review comments here or on the dev list. Was there an offline review?
          Hide
          dhruba borthakur added a comment -

          My apologies. My comment should have said "Better comments in the code".

          Show
          dhruba borthakur added a comment - My apologies. My comment should have said "Better comments in the code".
          Hide
          Doug Cutting added a comment -

          I just committed this. Thanks, Dhruba!

          Show
          Doug Cutting added a comment - I just committed this. Thanks, Dhruba!

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development