Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2126 Improve Namenode startup time [umbrella task]
  3. HDFS-1767

Namenode should ignore non-initial block reports from datanodes when in safemode during startup

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.203.0, 0.22.0, 0.23.0
    • Fix Version/s: 0.20.204.0, 0.23.0
    • Component/s: datanode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Tags:
      block report, performance, startup

      Description

      Consider a large cluster that takes 40 minutes to start up. The datanodes compete to register and send their Initial Block Reports (IBRs) as fast as they can after startup (subject to a small sub-two-minute random delay, which isn't relevant to this discussion).

      As each datanode succeeds in sending its IBR, it schedules the starting time for its regular cycle of reports, every hour (or other configured value of dfs.blockreport.intervalMsec). In order to spread the reports evenly across the block report interval, each datanode picks a random fraction of that interval, for the starting point of its regular report cycle. For example, if a particular datanode ends up randomly selecting 18 minutes after the hour, then that datanode will send a Block Report at 18 minutes after the hour every hour as long as it remains up. Other datanodes will start their cycles at other randomly selected times. This code is in DataNode.blockReport() and DataNode.scheduleBlockReport().

      The "second Block Report" (2BR), is the start of these hourly reports. The problem is that some of these 2BRs get scheduled sooner rather than later, and actually occur within the startup period. For example, if the cluster takes 40 minutes (2/3 of an hour) to start up, then out of the datanodes that succeed in sending their IBRs during the first 10 minutes, between 1/2 and 2/3 of them will send their 2BR before the 40-minute startup time has completed!

      2BRs sent within the startup time actually compete with the remaining IBRs, and thereby slow down the overall startup process. This can be seen in the following data, which shows the startup process for a 3700-node cluster that took about 17 minutes to finish startup:

            time    starts  sum   regs  sum   IBR  sum  2nd_BR sum total_BRs/min
      0   1299799498  3042  3042  1969  1969  151   151          0  151
      1   1299799558   665  3707  1470  3439  248   399          0  248
      2   1299799618        3707   224  3663  270   669          0  270
      3   1299799678        3707    14  3677  261   930    3     3  264
      4   1299799738        3707    23  3700  288  1218    1     4  289
      5   1299799798        3707     7  3707  258  1476    3     7  261
      6   1299799858        3707        3707  317  1793    4    11  321
      7   1299799918        3707        3707  292  2085    6    17  298
      8   1299799978        3707        3707  292  2377    8    25  300
      9   1299800038        3707        3707  272  2649         25  272
      10  1299800098        3707        3707  280  2929   15    40  295
      11  1299800158        3707        3707  223  3152   14    54  237
      12  1299800218        3707        3707  143  3295         54  143
      13  1299800278        3707        3707  141  3436   20    74  161
      14  1299800338        3707        3707  195  3631   78   152  273
      15  1299800398        3707        3707   51  3682  209   361  260
      16  1299800458        3707        3707   25  3707  369   730  394
      17  1299800518        3707        3707       3707  166   896  166
      18  1299800578        3707        3707       3707   72   968   72
      19  1299800638        3707        3707       3707   67  1035   67
      20  1299800698        3707        3707       3707   75  1110   75
      21  1299800758        3707        3707       3707   71  1181   71
      22  1299800818        3707        3707       3707   67  1248   67
      23  1299800878        3707        3707       3707   62  1310   62
      24  1299800938        3707        3707       3707   56  1366   56
      25  1299800998        3707        3707       3707   60  1426   60
      

      This data was harvested from the startup logs of all the datanodes, and correlated into one-minute buckets. Each row of the table represents the progress during one elapsed minute of clock time. It seems that every cluster startup is different, but this one showed the effect fairly well.

      The "starts" column shows that all the nodes started up within the first 2 minutes, and the "regs" column shows that all succeeded in registering by minute 6. The IBR column shows a sustained rate of Initial Block Report processing of 250-300/minute for the first 10 minutes.

      The question is why, during minutes 11 through 16, the rate of IBR processing slowed down. Why didn't the startup just finish? In the "2nd_BR" column, we see the rate of 2BRs ramping up as more datanodes complete their IBRs. As the rate increases, they become more effective at competing with the IBRs, and slow down the IBR processing even more. After the IBRs finally finish in minute 16, the rate of 2BRs settles down to a steady ~60-70/minute.

      In order to decrease competition for locks and other resources, to speed up IBR processing during startup, we propose to delay 2BRs until later into the cycle.

      1. table.csv
        2 kB
        Matt Foley
      2. table_tab.csv
        2 kB
        Matt Foley
      3. DelaySecondBR_v3-0.20-security.patch
        1 kB
        Matt Foley
      4. DelaySecondBR_v3.patch
        1 kB
        Matt Foley
      5. DelaySecondBR_v2.patch
        1 kB
        Matt Foley
      6. DelaySecondBR_v1.patch
        15 kB
        Matt Foley

        Activity

        Matt Foley created issue -
        Hide
        Matt Foley added a comment -

        Since the above data table lost formatting, I've attached two .csv files: table.csv is comma-delimited, and table_tab.csv is tab-delimited.

        Show
        Matt Foley added a comment - Since the above data table lost formatting, I've attached two .csv files: table.csv is comma-delimited, and table_tab.csv is tab-delimited.
        Matt Foley made changes -
        Field Original Value New Value
        Attachment table.csv [ 12473977 ]
        Attachment table_tab.csv [ 12473978 ]
        Hide
        Matt Foley added a comment -

        Here is one way to implement the improvement. This patch provides a new config parameter, dfs.blockreport.secondaryDelay, which defaults to 1/2 of dfs.blockreport.intervalMsec. 2BRs are delayed by this amount in addition to the original random interval in the range [0, dfs.blockreport.intervalMsec].

        This code is right alongside the use of dfs.blockreport.intervalMsec, dfs.blockreport.initialDelay, and dfs.heartbeat.interval, all of which were using non-standard defaults. So I took the opportunity to also fix those usages and eliminate FSConstants.HEARTBEAT_INTERVAL, BLOCKREPORT_INTERVAL, and BLOCKREPORT_INITIAL_DELAY in favor of their DFSConfigKeys equivalents, while introducing dfs.blockreport.secondaryDelay.

        Show
        Matt Foley added a comment - Here is one way to implement the improvement. This patch provides a new config parameter, dfs.blockreport.secondaryDelay, which defaults to 1/2 of dfs.blockreport.intervalMsec. 2BRs are delayed by this amount in addition to the original random interval in the range [0, dfs.blockreport.intervalMsec] . This code is right alongside the use of dfs.blockreport.intervalMsec, dfs.blockreport.initialDelay, and dfs.heartbeat.interval, all of which were using non-standard defaults. So I took the opportunity to also fix those usages and eliminate FSConstants.HEARTBEAT_INTERVAL, BLOCKREPORT_INTERVAL, and BLOCKREPORT_INITIAL_DELAY in favor of their DFSConfigKeys equivalents, while introducing dfs.blockreport.secondaryDelay.
        Matt Foley made changes -
        Attachment DelaySecondBR_v1.patch [ 12473980 ]
        Matt Foley made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12473980/DelaySecondBR_v1.patch
        against trunk revision 1082263.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.hdfs.TestFileConcurrentReader
        org.apache.hadoop.hdfs.TestHFlush

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/270//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/270//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/270//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12473980/DelaySecondBR_v1.patch against trunk revision 1082263. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.TestHFlush -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/270//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/270//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/270//console This message is automatically generated.
        Hairong Kuang made changes -
        Description Consider a large cluster that takes 40 minutes to start up. The datanodes compete to register and send their Initial Block Reports (IBRs) as fast as they can after startup (subject to a small sub-two-minute random delay, which isn't relevant to this discussion).

        As each datanode succeeds in sending its IBR, it schedules the starting time for its regular cycle of reports, every hour (or other configured value of dfs.blockreport.intervalMsec). In order to spread the reports evenly across the block report interval, each datanode picks a random fraction of that interval, for the starting point of its regular report cycle. For example, if a particular datanode ends up randomly selecting 18 minutes after the hour, then that datanode will send a Block Report at 18 minutes after the hour every hour as long as it remains up. Other datanodes will start their cycles at other randomly selected times. This code is in DataNode.blockReport() and DataNode.scheduleBlockReport().

        The "second Block Report" (2BR), is the start of these hourly reports. The problem is that some of these 2BRs get scheduled sooner rather than later, and actually occur within the startup period. For example, if the cluster takes 40 minutes (2/3 of an hour) to start up, then out of the datanodes that succeed in sending their IBRs during the first 10 minutes, between 1/2 and 2/3 of them will send their 2BR before the 40-minute startup time has completed!

        2BRs sent within the startup time actually compete with the remaining IBRs, and thereby slow down the overall startup process. This can be seen in the following data, which shows the startup process for a 3700-node cluster that took about 17 minutes to finish startup:

        <code>
              time starts sum regs sum IBR sum 2nd_BR sum total_BRs/min
        0 1299799498 3042 3042 1969 1969 151 151 0 151
        1 1299799558 665 3707 1470 3439 248 399 0 248
        2 1299799618 3707 224 3663 270 669 0 270
        3 1299799678 3707 14 3677 261 930 3 3 264
        4 1299799738 3707 23 3700 288 1218 1 4 289
        5 1299799798 3707 7 3707 258 1476 3 7 261
        6 1299799858 3707 3707 317 1793 4 11 321
        7 1299799918 3707 3707 292 2085 6 17 298
        8 1299799978 3707 3707 292 2377 8 25 300
        9 1299800038 3707 3707 272 2649 25 272
        10 1299800098 3707 3707 280 2929 15 40 295
        11 1299800158 3707 3707 223 3152 14 54 237
        12 1299800218 3707 3707 143 3295 54 143
        13 1299800278 3707 3707 141 3436 20 74 161
        14 1299800338 3707 3707 195 3631 78 152 273
        15 1299800398 3707 3707 51 3682 209 361 260
        16 1299800458 3707 3707 25 3707 369 730 394
        17 1299800518 3707 3707 3707 166 896 166
        18 1299800578 3707 3707 3707 72 968 72
        19 1299800638 3707 3707 3707 67 1035 67
        20 1299800698 3707 3707 3707 75 1110 75
        21 1299800758 3707 3707 3707 71 1181 71
        22 1299800818 3707 3707 3707 67 1248 67
        23 1299800878 3707 3707 3707 62 1310 62
        24 1299800938 3707 3707 3707 56 1366 56
        25 1299800998 3707 3707 3707 60 1426 60
        </code>

        This data was harvested from the startup logs of all the datanodes, and correlated into one-minute buckets. Each row of the table represents the progress during one elapsed minute of clock time. It seems that every cluster startup is different, but this one showed the effect fairly well.

        The "starts" column shows that all the nodes started up within the first 2 minutes, and the "regs" column shows that all succeeded in registering by minute 6. The IBR column shows a sustained rate of Initial Block Report processing of 250-300/minute for the first 10 minutes.

        The question is why, during minutes 11 through 16, the rate of IBR processing slowed down. Why didn't the startup just finish? In the "2nd_BR" column, we see the rate of 2BRs ramping up as more datanodes complete their IBRs. As the rate increases, they become more effective at competing with the IBRs, and slow down the IBR processing even more. After the IBRs finally finish in minute 16, the rate of 2BRs settles down to a steady ~60-70/minute.

        In order to decrease competition for locks and other resources, to speed up IBR processing during startup, we propose to delay 2BRs until later into the cycle.
        Consider a large cluster that takes 40 minutes to start up. The datanodes compete to register and send their Initial Block Reports (IBRs) as fast as they can after startup (subject to a small sub-two-minute random delay, which isn't relevant to this discussion).

        As each datanode succeeds in sending its IBR, it schedules the starting time for its regular cycle of reports, every hour (or other configured value of dfs.blockreport.intervalMsec). In order to spread the reports evenly across the block report interval, each datanode picks a random fraction of that interval, for the starting point of its regular report cycle. For example, if a particular datanode ends up randomly selecting 18 minutes after the hour, then that datanode will send a Block Report at 18 minutes after the hour every hour as long as it remains up. Other datanodes will start their cycles at other randomly selected times. This code is in DataNode.blockReport() and DataNode.scheduleBlockReport().

        The "second Block Report" (2BR), is the start of these hourly reports. The problem is that some of these 2BRs get scheduled sooner rather than later, and actually occur within the startup period. For example, if the cluster takes 40 minutes (2/3 of an hour) to start up, then out of the datanodes that succeed in sending their IBRs during the first 10 minutes, between 1/2 and 2/3 of them will send their 2BR before the 40-minute startup time has completed!

        2BRs sent within the startup time actually compete with the remaining IBRs, and thereby slow down the overall startup process. This can be seen in the following data, which shows the startup process for a 3700-node cluster that took about 17 minutes to finish startup:

        {noformat}
              time starts sum regs sum IBR sum 2nd_BR sum total_BRs/min
        0 1299799498 3042 3042 1969 1969 151 151 0 151
        1 1299799558 665 3707 1470 3439 248 399 0 248
        2 1299799618 3707 224 3663 270 669 0 270
        3 1299799678 3707 14 3677 261 930 3 3 264
        4 1299799738 3707 23 3700 288 1218 1 4 289
        5 1299799798 3707 7 3707 258 1476 3 7 261
        6 1299799858 3707 3707 317 1793 4 11 321
        7 1299799918 3707 3707 292 2085 6 17 298
        8 1299799978 3707 3707 292 2377 8 25 300
        9 1299800038 3707 3707 272 2649 25 272
        10 1299800098 3707 3707 280 2929 15 40 295
        11 1299800158 3707 3707 223 3152 14 54 237
        12 1299800218 3707 3707 143 3295 54 143
        13 1299800278 3707 3707 141 3436 20 74 161
        14 1299800338 3707 3707 195 3631 78 152 273
        15 1299800398 3707 3707 51 3682 209 361 260
        16 1299800458 3707 3707 25 3707 369 730 394
        17 1299800518 3707 3707 3707 166 896 166
        18 1299800578 3707 3707 3707 72 968 72
        19 1299800638 3707 3707 3707 67 1035 67
        20 1299800698 3707 3707 3707 75 1110 75
        21 1299800758 3707 3707 3707 71 1181 71
        22 1299800818 3707 3707 3707 67 1248 67
        23 1299800878 3707 3707 3707 62 1310 62
        24 1299800938 3707 3707 3707 56 1366 56
        25 1299800998 3707 3707 3707 60 1426 60
        {noformat}

        This data was harvested from the startup logs of all the datanodes, and correlated into one-minute buckets. Each row of the table represents the progress during one elapsed minute of clock time. It seems that every cluster startup is different, but this one showed the effect fairly well.

        The "starts" column shows that all the nodes started up within the first 2 minutes, and the "regs" column shows that all succeeded in registering by minute 6. The IBR column shows a sustained rate of Initial Block Report processing of 250-300/minute for the first 10 minutes.

        The question is why, during minutes 11 through 16, the rate of IBR processing slowed down. Why didn't the startup just finish? In the "2nd_BR" column, we see the rate of 2BRs ramping up as more datanodes complete their IBRs. As the rate increases, they become more effective at competing with the IBRs, and slow down the IBR processing even more. After the IBRs finally finish in minute 16, the rate of 2BRs settles down to a steady ~60-70/minute.

        In order to decrease competition for locks and other resources, to speed up IBR processing during startup, we propose to delay 2BRs until later into the cycle.
        Hide
        Hairong Kuang added a comment -

        Great data and findings! This will definitely help speedup NN start time.

        Show
        Hairong Kuang added a comment - Great data and findings! This will definitely help speedup NN start time.
        Hide
        Matt Foley added a comment -

        The contrib tests (TestHFlush and TestFileConcurrentReader) and contrib tests (hdfsproxy.TestAuthorizationFilter) cited seem unrelated to this patch. Furthermore, the contrib tests pass in my environment. I think these are false positives. Thanks.

        Show
        Matt Foley added a comment - The contrib tests (TestHFlush and TestFileConcurrentReader) and contrib tests (hdfsproxy.TestAuthorizationFilter) cited seem unrelated to this patch. Furthermore, the contrib tests pass in my environment. I think these are false positives. Thanks.
        Hide
        dhruba borthakur added a comment -

        good stuff. This patch look good.

        In the longer term, it would be nice if we can make the namenode ask for block reports, rather than the datanode sending block reports voluntarily. This will help flow control/congestion control at namenode.

        Show
        dhruba borthakur added a comment - good stuff. This patch look good. In the longer term, it would be nice if we can make the namenode ask for block reports, rather than the datanode sending block reports voluntarily. This will help flow control/congestion control at namenode.
        Hide
        Konstantin Shvachko added a comment -

        Great analysis indeed, Matt.
        Can we make NN just ignore the second BR, while it is in the start up mode?
        Trying to avoid introducing more configuration parameters.

        Show
        Konstantin Shvachko added a comment - Great analysis indeed, Matt. Can we make NN just ignore the second BR, while it is in the start up mode? Trying to avoid introducing more configuration parameters.
        Hide
        Matt Foley added a comment -

        We could, but I think this is just as effective and the code complexity increment is much lower. Less chance for unforeseen consequences and hidden bugs.

        Show
        Matt Foley added a comment - We could, but I think this is just as effective and the code complexity increment is much lower. Less chance for unforeseen consequences and hidden bugs.
        Hide
        Suresh Srinivas added a comment -

        I agree with Konstantin on not adding a new configuration for delaying second block report as this would be an additional config that needs to be tweaked based on the size of the system and startup time it takes. For the first cut, I am fine ignoring the second block report. This has no bad side effect.

        I liked Dhruba's suggestion. However I feel Dhruba's change is much harder to get right, with NN having to do the flow control of block reports from all the datanodes.

        Another option we could consider is, to send HeartBeatResponse instead of DatanodeCommand[] in response to datanode heartbeat request. This response could include namenode state such as in safemode, out of safemode etc. This information could be used by the datanode to decide when to send the second BR. Additionally namenode could communicate other information with datanode, in the future, such as load etc. which could help throttle the load on namenode at the source.

        Show
        Suresh Srinivas added a comment - I agree with Konstantin on not adding a new configuration for delaying second block report as this would be an additional config that needs to be tweaked based on the size of the system and startup time it takes. For the first cut, I am fine ignoring the second block report. This has no bad side effect. I liked Dhruba's suggestion. However I feel Dhruba's change is much harder to get right, with NN having to do the flow control of block reports from all the datanodes. Another option we could consider is, to send HeartBeatResponse instead of DatanodeCommand[] in response to datanode heartbeat request. This response could include namenode state such as in safemode, out of safemode etc. This information could be used by the datanode to decide when to send the second BR. Additionally namenode could communicate other information with datanode, in the future, such as load etc. which could help throttle the load on namenode at the source.
        Hide
        Matt Foley added a comment -

        Okay. We only want to ignore second and later Block Reports during startup safemode, not if the system is put back into safemode later. So I will merge the proposed fix for HDFS-1726 (query method for what kind of safe mode the Namenode is in) into this fix. Patch to come shortly.

        Show
        Matt Foley added a comment - Okay. We only want to ignore second and later Block Reports during startup safemode, not if the system is put back into safemode later. So I will merge the proposed fix for HDFS-1726 (query method for what kind of safe mode the Namenode is in) into this fix. Patch to come shortly.
        Hide
        Nathan Roberts added a comment -

        I feel like Dhruba's suggestion deserves a new jira. It feels more correct for the namenode to ask for these things as opposed to the datanodes sending them unsolicited. I don't know how reasonable it would be, but maybe a shorter term fix along these lines would be to send a DataNodeCommand to enable 2BRs?

        Show
        Nathan Roberts added a comment - I feel like Dhruba's suggestion deserves a new jira. It feels more correct for the namenode to ask for these things as opposed to the datanodes sending them unsolicited. I don't know how reasonable it would be, but maybe a shorter term fix along these lines would be to send a DataNodeCommand to enable 2BRs?
        Hide
        Matt Foley added a comment -

        Really short patch. Decided to leave HDFS-1726 in its own Jira and just did the minimum here.

        Show
        Matt Foley added a comment - Really short patch. Decided to leave HDFS-1726 in its own Jira and just did the minimum here.
        Matt Foley made changes -
        Attachment DelaySecondBR_v2.patch [ 12474670 ]
        Matt Foley made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Matt Foley made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12474670/DelaySecondBR_v2.patch
        against trunk revision 1085509.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.cli.TestHDFSCLI
        org.apache.hadoop.hdfs.server.datanode.TestBlockReport
        org.apache.hadoop.hdfs.TestDFSShell

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/287//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/287//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/287//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12474670/DelaySecondBR_v2.patch against trunk revision 1085509. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.datanode.TestBlockReport org.apache.hadoop.hdfs.TestDFSShell -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/287//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/287//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/287//console This message is automatically generated.
        Hide
        Matt Foley added a comment -

        Out of the 10 core and contrib tests said to fail, all but one seem to have nothing to do with this patch, and indeed are stated to be failing since well before this patch was made available. So I don't think they should show up as a negative on this patch.

        The one test case that might have something to do with this patch, org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_08, runs fine in my environment, being very careful to insure that the patch was indeed present in the compilation. Furthermore, Hudson's delta calculation says it fails before applying this patch. And reviewing the logs, it is seen that the test cluster leaves Safe Mode successfully, so it is unlikely that this patch has anything to do with the failure.

        Please consider these test failures to be false positives.

        Show
        Matt Foley added a comment - Out of the 10 core and contrib tests said to fail, all but one seem to have nothing to do with this patch, and indeed are stated to be failing since well before this patch was made available. So I don't think they should show up as a negative on this patch. The one test case that might have something to do with this patch, org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport_08, runs fine in my environment, being very careful to insure that the patch was indeed present in the compilation. Furthermore, Hudson's delta calculation says it fails before applying this patch. And reviewing the logs, it is seen that the test cluster leaves Safe Mode successfully, so it is unlikely that this patch has anything to do with the failure. Please consider these test failures to be false positives.
        Hide
        Matt Foley added a comment -

        Minor tweak to log string, caught by Suresh.

        Show
        Matt Foley added a comment - Minor tweak to log string, caught by Suresh.
        Matt Foley made changes -
        Attachment DelaySecondBR_v3.patch [ 12474913 ]
        Suresh Srinivas made changes -
        Summary Delay second Block Reports until after cluster finishes startup, to improve startup times Namenode should ignore second block report from datanodes when in safemode during startup
        Affects Version/s 0.20.204 [ 12316319 ]
        Affects Version/s 0.23.0 [ 12315571 ]
        Suresh Srinivas made changes -
        Summary Namenode should ignore second block report from datanodes when in safemode during startup Namenode should ignore non-initial block reports from datanodes when in safemode during startup
        Hide
        Suresh Srinivas added a comment -

        I committed this patch to trunk. Thank you Matt.

        Show
        Suresh Srinivas added a comment - I committed this patch to trunk. Thank you Matt.
        Suresh Srinivas made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Matt Foley added a comment -

        Uploaded corresponding patch for v0.20-security.

        Show
        Matt Foley added a comment - Uploaded corresponding patch for v0.20-security.
        Matt Foley made changes -
        Attachment DelaySecondBR_v3-0.20-security.patch [ 12474920 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12474913/DelaySecondBR_v3.patch
        against trunk revision 1086654.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.cli.TestHDFSCLI
        org.apache.hadoop.hdfs.server.datanode.TestBlockReport
        org.apache.hadoop.hdfs.TestDFSShell
        org.apache.hadoop.hdfs.TestFileConcurrentReader

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/302//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/302//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/302//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12474913/DelaySecondBR_v3.patch against trunk revision 1086654. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.datanode.TestBlockReport org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/302//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/302//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/302//console This message is automatically generated.
        Hide
        Matt Foley added a comment -

        The findbugs warning is spurious - it is in FSDataset, which was not changed by this patch.

        Regarding the core and contrib test failures, the same comments apply as above (see https://issues.apache.org/jira/browse/HDFS-1767?focusedCommentId=13012616&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13012616)

        Show
        Matt Foley added a comment - The findbugs warning is spurious - it is in FSDataset, which was not changed by this patch. Regarding the core and contrib test failures, the same comments apply as above (see https://issues.apache.org/jira/browse/HDFS-1767?focusedCommentId=13012616&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13012616 )
        Hide
        Hairong Kuang added a comment -

        Matt, the committed patch cuts the non-initial block report processing time. But it does not remove the cost of sending/receiving the non-initial blcok reports as what you did in the first patch. Will you have a follow-up jira for removing it?

        Show
        Hairong Kuang added a comment - Matt, the committed patch cuts the non-initial block report processing time. But it does not remove the cost of sending/receiving the non-initial blcok reports as what you did in the first patch. Will you have a follow-up jira for removing it?
        Hide
        Suresh Srinivas added a comment -

        I committed the change to 20.204 as well.

        Show
        Suresh Srinivas added a comment - I committed the change to 20.204 as well.
        Suresh Srinivas made changes -
        Fix Version/s 0.20.204 [ 12316319 ]
        Hide
        Matt Foley added a comment -

        Hairong, that's a valid point. However, the send/receive overhead is fully concurrent, so the effective cost is small as long as the block report processing remains single-threaded (under the global FSNamesystem write lock).

        It sounds like Dhruba and Nathan would like to see another way of doing flow-control on the BRs. Dhruba, would you like to open a Jira suggesting a design direction?

        Show
        Matt Foley added a comment - Hairong, that's a valid point. However, the send/receive overhead is fully concurrent, so the effective cost is small as long as the block report processing remains single-threaded (under the global FSNamesystem write lock). It sounds like Dhruba and Nathan would like to see another way of doing flow-control on the BRs. Dhruba, would you like to open a Jira suggesting a design direction?
        Suresh Srinivas made changes -
        Affects Version/s 0.20.203.0 [ 12316150 ]
        Affects Version/s 0.20.204 [ 12316319 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #582 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/582/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #582 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/582/ )
        Hide
        Hairong Kuang added a comment -

        Re-submitting to trigger auto testing.

        Show
        Hairong Kuang added a comment - Re-submitting to trigger auto testing.
        Hairong Kuang made changes -
        Attachment trunkLocalNameImage9.patch [ 12476359 ]
        Hairong Kuang made changes -
        Attachment trunkLocalNameImage9.patch [ 12476359 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ )
        Matt Foley made changes -
        Parent HDFS-2126 [ 12512881 ]
        Issue Type Improvement [ 4 ] Sub-task [ 7 ]
        Hide
        Owen O'Malley added a comment -

        Hadoop 0.20.204.0 was released.

        Show
        Owen O'Malley added a comment - Hadoop 0.20.204.0 was released.
        Owen O'Malley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Matt Foley
            Reporter:
            Matt Foley
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development