Hadoop Common
  1. Hadoop Common
  2. HADOOP-4971

Block report times from datanodes could converge to same time.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.0
    • Fix Version/s: 0.18.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      A long (unexpected) delay at datanodes could make subsequent block reports from many datanode at the same time.

      Description

      Datanode block reports take quite a bit of memory to process at the namenode. After the inital report, DNs pick a random time to spread this load across at the NN. This normally works fine.

      Block reports are sent inside "offerService()" thread in DN. If for some reason this thread was stuck for long time (comparable to block report interval), and same thing happens on many DNs, all of them get back to the loop at the same time and start sending block report then and every hour at the same time.

      RPC server and clients in 0.18 can handle this situation fine. But since this is a memory intensive RPC it lead to large GC delays at the NN. We don't know yet why offerService therads seemed to be stuck, but DN should re-randomize it block report time in such cases.

      1. HADOOP-4971.patch
        1 kB
        Raghu Angadi
      2. HADOOP-4971.patch
        1 kB
        Raghu Angadi
      3. HADOOP-4971-branch-18.patch
        1.0 kB
        Raghu Angadi

        Activity

        Hide
        Raghu Angadi added a comment -

        Regd testing the feature :

        I have manually tested this by introducing an explicit delay once in a while in offerService() at the datanode.

        Also modified heartbeats interval to be very low so that I can keep block report interval to be just a few seconds.

        Say block report interval is 4 seconds.

        Without the patch : When the delay of 10 seconds occurs t, subsequent block reports occur at t+14, t+18 etc.

        With the patch : subsequent reports occur at t+12, t+16, t+20.

        This confirms that if such a delay were to occur at the many datanodes at the same time, their subsequent blockreport occur based on what the intitial blockReport time was rather than when the delay ended.

        Show
        Raghu Angadi added a comment - Regd testing the feature : I have manually tested this by introducing an explicit delay once in a while in offerService() at the datanode. Also modified heartbeats interval to be very low so that I can keep block report interval to be just a few seconds. Say block report interval is 4 seconds. Without the patch : When the delay of 10 seconds occurs t, subsequent block reports occur at t+14, t+18 etc. With the patch : subsequent reports occur at t+12, t+16, t+20. This confirms that if such a delay were to occur at the many datanodes at the same time, their subsequent blockreport occur based on what the intitial blockReport time was rather than when the delay ended.
        Hide
        Raghu Angadi added a comment -

        I just committed this to 0.18 and up.

        Show
        Raghu Angadi added a comment - I just committed this to 0.18 and up.
        Hide
        Raghu Angadi added a comment -

        test-patch :

        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
        [exec] Please justify why no tests are needed for this patch.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        Show
        Raghu Angadi added a comment - test-patch : [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        +1 patch looks good.

        Show
        Tsz Wo Nicholas Sze added a comment - +1 patch looks good.
        Hide
        Raghu Angadi added a comment -

        Thanks Nicholas. Updated patch uses the simpler equation. The comment is slightly modified.

        Show
        Raghu Angadi added a comment - Thanks Nicholas. Updated patch uses the simpler equation. The comment is slightly modified.
        Hide
        Raghu Angadi added a comment -

        > [...] formula can be simplified as below
        right.

        > I forgot to mention that the comment is a little bit confusing: the next report is actually around 11:35:43. The reports after the next will be xx:20:14.

        There are 3 reports involved in the comment : last, current, and next. Are you mixing "current" and "next"? The current report returned at 11:35:14 (we don't care when it was started).

        Show
        Raghu Angadi added a comment - > [...] formula can be simplified as below right. > I forgot to mention that the comment is a little bit confusing: the next report is actually around 11:35:43. The reports after the next will be xx:20:14. There are 3 reports involved in the comment : last, current, and next. Are you mixing "current" and "next"? The current report returned at 11:35:14 (we don't care when it was started).
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > 2) unexpected like 11:35:43, next report should be at 12:20:14

        I forgot to mention that the comment is a little bit confusing: the next report is actually around 11:35:43. The reports after the next will be xx:20:14.

        Show
        Tsz Wo Nicholas Sze added a comment - > 2) unexpected like 11:35:43, next report should be at 12:20:14 I forgot to mention that the comment is a little bit confusing: the next report is actually around 11:35:43. The reports after the next will be xx:20:14.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Patch looks good except that the formula can be simplified as below

            lastBlockReport += (System.currentTimeMillis() - lastBlockReport) /blockReportInterval * blockReportInterval;
        
        Show
        Tsz Wo Nicholas Sze added a comment - Patch looks good except that the formula can be simplified as below lastBlockReport += ( System .currentTimeMillis() - lastBlockReport) /blockReportInterval * blockReportInterval;
        Hide
        Raghu Angadi added a comment -

        A simple patch is attached.

        > If there are many datanodes doing block reports at 8:20:xx,
        Not expected to, since it already randomized. This does not propose to fix the case where most datanodes randomly assign themselves a number within small range of the block report interval.

        > then these datanodes will keep doing block report at the same time for the proposed fix.

        The patch does not change this behaviour. It is same as before. The start start time is already randomized. I might be misunderstanding. Could you give a concrete example (preferably with numbers) that shows the problem you are takinng about?

        Show
        Raghu Angadi added a comment - A simple patch is attached. > If there are many datanodes doing block reports at 8:20:xx, Not expected to, since it already randomized. This does not propose to fix the case where most datanodes randomly assign themselves a number within small range of the block report interval. > then these datanodes will keep doing block report at the same time for the proposed fix. The patch does not change this behaviour. It is same as before. The start start time is already randomized. I might be misunderstanding. Could you give a concrete example (preferably with numbers) that shows the problem you are takinng about?
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > say, last block report was at 8:20:14...
        If there are many datanodes doing block reports at 8:20:xx, then these datanodes will keep doing block report at the same time for the proposed fix.

        > .. but DN should re-randomize it block report time in such cases.
        +1 I think it is good to randomize block reports.

        Show
        Tsz Wo Nicholas Sze added a comment - > say, last block report was at 8:20:14... If there are many datanodes doing block reports at 8:20:xx, then these datanodes will keep doing block report at the same time for the proposed fix. > .. but DN should re-randomize it block report time in such cases. +1 I think it is good to randomize block reports.
        Hide
        Raghu Angadi added a comment -

        The delay need not be (in fact, it was not) in block report RPC. This was else where in the offerService loop. Something at NameNode caused this.

        Proposed fix (comment from the patch) :

                    /* say, last block report was at 8:20:14. current report should have
                     * occured around 9:20:14 (with default 1 hour block reports). 
                     * If current time is something :
                     * 1) normal like 9:20:18, next report should be at 10:20:14
                     * 2) unexpected like 11:35:43, next report should be at 12:20:14
                     */
        
        Show
        Raghu Angadi added a comment - The delay need not be (in fact, it was not) in block report RPC. This was else where in the offerService loop. Something at NameNode caused this. Proposed fix (comment from the patch) : /* say, last block report was at 8:20:14. current report should have * occured around 9:20:14 (with default 1 hour block reports). * If current time is something : * 1) normal like 9:20:18, next report should be at 10:20:14 * 2) unexpected like 11:35:43, next report should be at 12:20:14 */
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > Block reports are sent inside "offerService()" thread in DN. If for some reason this thread was stuck for long time (comparable to block report interval), and same thing happens on many DNs, all of them get back to the loop at the same time and start sending block report then and every hour at the same time.

        Seems that there is a hidden synchronization of block reports happening. We probably should let Datanodes sleep a random interval between block reports.

        Show
        Tsz Wo Nicholas Sze added a comment - > Block reports are sent inside "offerService()" thread in DN. If for some reason this thread was stuck for long time (comparable to block report interval), and same thing happens on many DNs, all of them get back to the loop at the same time and start sending block report then and every hour at the same time. Seems that there is a hidden synchronization of block reports happening. We probably should let Datanodes sleep a random interval between block reports.

          People

          • Assignee:
            Raghu Angadi
            Reporter:
            Raghu Angadi
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development