Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.3
    • Fix Version/s: 0.18.4, 0.19.2, 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Occasionally we see some blocks remain to be under-replicated in our production clusters. This is what we obeserved:
      1. Sometimes when increasing the replication factor of a file, some blocks belonged to this file do not get to increase to the new replication factor.
      2. When taking meta save in two different days, some blocks remain in under-replication queue.

      1. xmitsSync1.patch
        2 kB
        Hairong Kuang
      2. xmitsSync2.patch
        2 kB
        Hairong Kuang
      3. xmitsSync2-br18.patch
        2 kB
        Hairong Kuang

        Issue Links

          Activity

          Hide
          Hairong Kuang added a comment -

          Thank Koji for his tireless investigation on this issue.

          When this situation occurs, the source DataNode of the block shows abnormal behavior. No blocks gets replicated from this node or no block gets removed from this node. Digging into the problem, we seet that NameNode sends the DataNode an empty replication request, i.e. a replication request with no blocks and targets as parameters, on every heartbeat reply, thus preventing sending the node any replication or deletion request. More suspiciously DataNode notifies NameNode that it has 1 replication in progress although its jstack shows that it has no replication (data transfer) thread alive.

          Show
          Hairong Kuang added a comment - Thank Koji for his tireless investigation on this issue. When this situation occurs, the source DataNode of the block shows abnormal behavior. No blocks gets replicated from this node or no block gets removed from this node. Digging into the problem, we seet that NameNode sends the DataNode an empty replication request, i.e. a replication request with no blocks and targets as parameters, on every heartbeat reply, thus preventing sending the node any replication or deletion request. More suspiciously DataNode notifies NameNode that it has 1 replication in progress although its jstack shows that it has no replication (data transfer) thread alive.
          Hide
          Hairong Kuang added a comment - - edited

          Two bugs in DFS contributed to the problem:
          (1). DataNode does not sync on modification to the counter "xmitsInProgress", which keeps track of the number of replication in progress. When two threads update the counter concurrently, race condition may occurs. The counter may change to be a non-zero value when no replication is going on.
          (2). Each DN is configured to have at most 2 replications in progress. When DN notifies NN that it has 1 replication in progress, NN should be able to send one block replication request to DN. But NN wrongly interprets the counter as the number of targets. When it sees that the block is scheduled to 2 targets but DN can only take 1, it sends an empty replication request to DN. As a result, blocking all replications from this DataNode. If the DataNode is the only source of an under-replicated block, the block will never get replicated.

          Fixing either (1) or (2) could fix the problem. I think (1) is more fundamental so I will fix (1) in this jira and file a different jira to fix (2).

          Show
          Hairong Kuang added a comment - - edited Two bugs in DFS contributed to the problem: (1). DataNode does not sync on modification to the counter "xmitsInProgress", which keeps track of the number of replication in progress. When two threads update the counter concurrently, race condition may occurs. The counter may change to be a non-zero value when no replication is going on. (2). Each DN is configured to have at most 2 replications in progress. When DN notifies NN that it has 1 replication in progress, NN should be able to send one block replication request to DN. But NN wrongly interprets the counter as the number of targets. When it sees that the block is scheduled to 2 targets but DN can only take 1, it sends an empty replication request to DN. As a result, blocking all replications from this DataNode. If the DataNode is the only source of an under-replicated block, the block will never get replicated. Fixing either (1) or (2) could fix the problem. I think (1) is more fundamental so I will fix (1) in this jira and file a different jira to fix (2).
          Hide
          Raghu Angadi added a comment -

          This implies that all the blocks that remained under replicated have only one replica and only on this specific datanode. Was that the case?

          Show
          Raghu Angadi added a comment - This implies that all the blocks that remained under replicated have only one replica and only on this specific datanode. Was that the case?
          Hide
          Hairong Kuang added a comment -

          The previous patch synced the counter on a wrong object. This patch uses AtomicInteger to guarantee atomic modification.

          Show
          Hairong Kuang added a comment - The previous patch synced the counter on a wrong object. This patch uses AtomicInteger to guarantee atomic modification.
          Hide
          Hairong Kuang added a comment -

          >This implies that all the blocks that remained under replicated have only one replica and only on this specific datanode. Was that the case?
          Yes, most of the blocks have only one source. Those are the kind of blocks that initially triggers a DataNode into this state. But we could and our clusters do have under-replicated blocks that have two replicas and all its sources are in this state. The only exception is a block in our clusters that has two sources, one in this state but the other is replicating. This block is still under investigation.

          Show
          Hairong Kuang added a comment - >This implies that all the blocks that remained under replicated have only one replica and only on this specific datanode. Was that the case? Yes, most of the blocks have only one source. Those are the kind of blocks that initially triggers a DataNode into this state. But we could and our clusters do have under-replicated blocks that have two replicas and all its sources are in this state. The only exception is a block in our clusters that has two sources, one in this state but the other is replicating. This block is still under investigation.
          Hide
          Raghu Angadi added a comment -

          Thanks Hairong. Since a rare race condition is suspected, I thought there would be very few datanodes hitting such a race condition.

          Show
          Raghu Angadi added a comment - Thanks Hairong. Since a rare race condition is suspected, I thought there would be very few datanodes hitting such a race condition.
          Hide
          Hairong Kuang added a comment -

          > I thought there would be very few datanodes hitting such a race condition.
          On a cluster with thousands of machines, we saw 5% of the nodes were in this state.

          > The only exception is a block in our clusters that has two sources, one in this state but the other is replicating.
          It turns out that the other source that is replicating has a corrupt copy of the block.

          Show
          Hairong Kuang added a comment - > I thought there would be very few datanodes hitting such a race condition. On a cluster with thousands of machines, we saw 5% of the nodes were in this state. > The only exception is a block in our clusters that has two sources, one in this state but the other is replicating. It turns out that the other source that is replicating has a corrupt copy of the block.
          Hide
          Raghu Angadi added a comment -

          Since this is such a vital stat, may be better to decrement at the top of finally block (so that some other runtime exception does not cause this situation again).

          Show
          Raghu Angadi added a comment - Since this is such a vital stat, may be better to decrement at the top of finally block (so that some other runtime exception does not cause this situation again).
          Hide
          Hairong Kuang added a comment -

          The patch incorporates Raghu's comment.

          Show
          Hairong Kuang added a comment - The patch incorporates Raghu's comment.
          Hide
          Raghu Angadi added a comment -

          +1.

          Show
          Raghu Angadi added a comment - +1.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12402100/xmitsSync2.patch
          against trunk revision 753052.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12402100/xmitsSync2.patch against trunk revision 753052. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          Attach a patch to 0.18.

          Show
          Hairong Kuang added a comment - Attach a patch to 0.18.
          Hide
          Hairong Kuang added a comment -

          I've just committed this.

          Show
          Hairong Kuang added a comment - I've just committed this.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #779 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/779/)
          . Blocks remain under-replicated. Contributed by Hairong Kuang.

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #779 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/779/ ) . Blocks remain under-replicated. Contributed by Hairong Kuang.
          Hide
          Hairong Kuang added a comment -

          This jira is too trivial to add a unit test.

          Show
          Hairong Kuang added a comment - This jira is too trivial to add a unit test.

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development