Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.3
    • Fix Version/s: 0.18.4, 0.19.2, 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Occasionally we see some blocks remain to be under-replicated in our production clusters. This is what we obeserved:
      1. Sometimes when increasing the replication factor of a file, some blocks belonged to this file do not get to increase to the new replication factor.
      2. When taking meta save in two different days, some blocks remain in under-replication queue.

      1. xmitsSync2-br18.patch
        2 kB
        Hairong Kuang
      2. xmitsSync2.patch
        2 kB
        Hairong Kuang
      3. xmitsSync1.patch
        2 kB
        Hairong Kuang

        Issue Links

          Activity

          Hairong Kuang created issue -
          Hide
          Hairong Kuang added a comment -

          Thank Koji for his tireless investigation on this issue.

          When this situation occurs, the source DataNode of the block shows abnormal behavior. No blocks gets replicated from this node or no block gets removed from this node. Digging into the problem, we seet that NameNode sends the DataNode an empty replication request, i.e. a replication request with no blocks and targets as parameters, on every heartbeat reply, thus preventing sending the node any replication or deletion request. More suspiciously DataNode notifies NameNode that it has 1 replication in progress although its jstack shows that it has no replication (data transfer) thread alive.

          Show
          Hairong Kuang added a comment - Thank Koji for his tireless investigation on this issue. When this situation occurs, the source DataNode of the block shows abnormal behavior. No blocks gets replicated from this node or no block gets removed from this node. Digging into the problem, we seet that NameNode sends the DataNode an empty replication request, i.e. a replication request with no blocks and targets as parameters, on every heartbeat reply, thus preventing sending the node any replication or deletion request. More suspiciously DataNode notifies NameNode that it has 1 replication in progress although its jstack shows that it has no replication (data transfer) thread alive.
          Hide
          Hairong Kuang added a comment - - edited

          Two bugs in DFS contributed to the problem:
          (1). DataNode does not sync on modification to the counter "xmitsInProgress", which keeps track of the number of replication in progress. When two threads update the counter concurrently, race condition may occurs. The counter may change to be a non-zero value when no replication is going on.
          (2). Each DN is configured to have at most 2 replications in progress. When DN notifies NN that it has 1 replication in progress, NN should be able to send one block replication request to DN. But NN wrongly interprets the counter as the number of targets. When it sees that the block is scheduled to 2 targets but DN can only take 1, it sends an empty replication request to DN. As a result, blocking all replications from this DataNode. If the DataNode is the only source of an under-replicated block, the block will never get replicated.

          Fixing either (1) or (2) could fix the problem. I think (1) is more fundamental so I will fix (1) in this jira and file a different jira to fix (2).

          Show
          Hairong Kuang added a comment - - edited Two bugs in DFS contributed to the problem: (1). DataNode does not sync on modification to the counter "xmitsInProgress", which keeps track of the number of replication in progress. When two threads update the counter concurrently, race condition may occurs. The counter may change to be a non-zero value when no replication is going on. (2). Each DN is configured to have at most 2 replications in progress. When DN notifies NN that it has 1 replication in progress, NN should be able to send one block replication request to DN. But NN wrongly interprets the counter as the number of targets. When it sees that the block is scheduled to 2 targets but DN can only take 1, it sends an empty replication request to DN. As a result, blocking all replications from this DataNode. If the DataNode is the only source of an under-replicated block, the block will never get replicated. Fixing either (1) or (2) could fix the problem. I think (1) is more fundamental so I will fix (1) in this jira and file a different jira to fix (2).
          Hairong Kuang made changes -
          Field Original Value New Value
          Attachment xmitsSync.patch [ 12402069 ]
          Hairong Kuang made changes -
          Link This issue relates to HADOOP-5479 [ HADOOP-5479 ]
          Hide
          Raghu Angadi added a comment -

          This implies that all the blocks that remained under replicated have only one replica and only on this specific datanode. Was that the case?

          Show
          Raghu Angadi added a comment - This implies that all the blocks that remained under replicated have only one replica and only on this specific datanode. Was that the case?
          Hide
          Hairong Kuang added a comment -

          The previous patch synced the counter on a wrong object. This patch uses AtomicInteger to guarantee atomic modification.

          Show
          Hairong Kuang added a comment - The previous patch synced the counter on a wrong object. This patch uses AtomicInteger to guarantee atomic modification.
          Hairong Kuang made changes -
          Attachment xmitsSync1.patch [ 12402079 ]
          Hairong Kuang made changes -
          Attachment xmitsSync.patch [ 12402069 ]
          Hide
          Hairong Kuang added a comment -

          >This implies that all the blocks that remained under replicated have only one replica and only on this specific datanode. Was that the case?
          Yes, most of the blocks have only one source. Those are the kind of blocks that initially triggers a DataNode into this state. But we could and our clusters do have under-replicated blocks that have two replicas and all its sources are in this state. The only exception is a block in our clusters that has two sources, one in this state but the other is replicating. This block is still under investigation.

          Show
          Hairong Kuang added a comment - >This implies that all the blocks that remained under replicated have only one replica and only on this specific datanode. Was that the case? Yes, most of the blocks have only one source. Those are the kind of blocks that initially triggers a DataNode into this state. But we could and our clusters do have under-replicated blocks that have two replicas and all its sources are in this state. The only exception is a block in our clusters that has two sources, one in this state but the other is replicating. This block is still under investigation.
          Hairong Kuang made changes -
          Fix Version/s 0.19.2 [ 12313650 ]
          Fix Version/s 0.20.0 [ 12313438 ]
          Fix Version/s 0.21.0 [ 12313563 ]
          Hide
          Raghu Angadi added a comment -

          Thanks Hairong. Since a rare race condition is suspected, I thought there would be very few datanodes hitting such a race condition.

          Show
          Raghu Angadi added a comment - Thanks Hairong. Since a rare race condition is suspected, I thought there would be very few datanodes hitting such a race condition.
          Hide
          Hairong Kuang added a comment -

          > I thought there would be very few datanodes hitting such a race condition.
          On a cluster with thousands of machines, we saw 5% of the nodes were in this state.

          > The only exception is a block in our clusters that has two sources, one in this state but the other is replicating.
          It turns out that the other source that is replicating has a corrupt copy of the block.

          Show
          Hairong Kuang added a comment - > I thought there would be very few datanodes hitting such a race condition. On a cluster with thousands of machines, we saw 5% of the nodes were in this state. > The only exception is a block in our clusters that has two sources, one in this state but the other is replicating. It turns out that the other source that is replicating has a corrupt copy of the block.
          Hide
          Raghu Angadi added a comment -

          Since this is such a vital stat, may be better to decrement at the top of finally block (so that some other runtime exception does not cause this situation again).

          Show
          Raghu Angadi added a comment - Since this is such a vital stat, may be better to decrement at the top of finally block (so that some other runtime exception does not cause this situation again).
          Hide
          Hairong Kuang added a comment -

          The patch incorporates Raghu's comment.

          Show
          Hairong Kuang added a comment - The patch incorporates Raghu's comment.
          Hairong Kuang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hairong Kuang made changes -
          Attachment xmitsSync2.patch [ 12402100 ]
          Hide
          Raghu Angadi added a comment -

          +1.

          Show
          Raghu Angadi added a comment - +1.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12402100/xmitsSync2.patch
          against trunk revision 753052.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12402100/xmitsSync2.patch against trunk revision 753052. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/82/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          Attach a patch to 0.18.

          Show
          Hairong Kuang added a comment - Attach a patch to 0.18.
          Hairong Kuang made changes -
          Attachment xmitsSync2-br18.patch [ 12402167 ]
          Hide
          Hairong Kuang added a comment -

          I've just committed this.

          Show
          Hairong Kuang added a comment - I've just committed this.
          Hairong Kuang made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #779 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/779/)
          . Blocks remain under-replicated. Contributed by Hairong Kuang.

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #779 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/779/ ) . Blocks remain under-replicated. Contributed by Hairong Kuang.
          Hide
          Hairong Kuang added a comment -

          This jira is too trivial to add a unit test.

          Show
          Hairong Kuang added a comment - This jira is too trivial to add a unit test.
          Nigel Daley made changes -
          Fix Version/s 0.21.0 [ 12313563 ]
          Nigel Daley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          1d 29m 1 Hairong Kuang 12/Mar/09 23:33
          Patch Available Patch Available Resolved Resolved
          20h 26m 1 Hairong Kuang 13/Mar/09 19:59
          Resolved Resolved Closed Closed
          40d 23h 18m 1 Nigel Daley 23/Apr/09 20:18

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development