Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7836 BlockManager Scalability Improvements
  3. HDFS-7923

The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      The DataNodes should rate-limit their full block reports. They can do this by first sending a heartbeat message to the NN with an optional boolean set which requests permission to send a full block report. If the NN responds with another optional boolean set, the DN will send an FBR... if not, it will wait until later. This can be done compatibly with optional fields.

      1. HDFS-7923.000.patch
        41 kB
        Charles Lamb
      2. HDFS-7923.001.patch
        41 kB
        Charles Lamb
      3. HDFS-7923.002.patch
        46 kB
        Charles Lamb
      4. HDFS-7923.003.patch
        74 kB
        Colin P. McCabe
      5. HDFS-7923.004.patch
        80 kB
        Colin P. McCabe
      6. HDFS-7923.006.patch
        77 kB
        Colin P. McCabe
      7. HDFS-7923.007.patch
        77 kB
        Colin P. McCabe

        Issue Links

          Activity

          Hide
          clamb Charles Lamb added a comment -

          Here is a description of the heuristic that my patch has implemented for the NN to determine what to send back in response to the "should I send a BR?" question. In the vein of keeping it relatively simple, let's consider 3 parameters:

          • The max # of FBR requests that the NN is willing to process at any given time (to be called 'dfs.namenode.max.concurrent.block.reports', with a default of Integer.MAX_INTEGER)
          • The DN's configured block report interval (dfs.blockreport.intervalMsec). This parameter already exists.
          • The max time we ever want the NN to go without receiving an FBR from a given DN ('dfs.blockreport.max.deferMsec').

          If the time since the last FBR received from the DN is less than dfs.blockreport.intervalMsec, then it returns false ("No, don't send an FBR"). In theory, this should never happen if the DN is obeying dfs.blockreport.intervalMsec.

          If the number of block reports currently being processed by an NN is less than dfs.namenode.max.concurrent.block.reports, and the time since it last received an FBR from the DN sending the heartbeat is greater than dfs.blockreport.intervalMsec, then the NN automatically answers true ("Yes, send along an FBR").

          If the number of BRs being processed by an NN is > than dfs.namenode.max.concurrent.block.reports when it receives the heartbeat, then it checks the last time that it received an FBR from the DN sending the heartbeat and if it's greater than dfs.blockreport.max.deferMsec, then it returns true ("Yes, send along an FBR"). If the time-since-last-FBR is less than dfs.blockreport.max.deferMsec, then it returns false.

          Show
          clamb Charles Lamb added a comment - Here is a description of the heuristic that my patch has implemented for the NN to determine what to send back in response to the "should I send a BR?" question. In the vein of keeping it relatively simple, let's consider 3 parameters: The max # of FBR requests that the NN is willing to process at any given time (to be called 'dfs.namenode.max.concurrent.block.reports', with a default of Integer.MAX_INTEGER) The DN's configured block report interval (dfs.blockreport.intervalMsec). This parameter already exists. The max time we ever want the NN to go without receiving an FBR from a given DN ('dfs.blockreport.max.deferMsec'). If the time since the last FBR received from the DN is less than dfs.blockreport.intervalMsec, then it returns false ("No, don't send an FBR"). In theory, this should never happen if the DN is obeying dfs.blockreport.intervalMsec. If the number of block reports currently being processed by an NN is less than dfs.namenode.max.concurrent.block.reports, and the time since it last received an FBR from the DN sending the heartbeat is greater than dfs.blockreport.intervalMsec, then the NN automatically answers true ("Yes, send along an FBR"). If the number of BRs being processed by an NN is > than dfs.namenode.max.concurrent.block.reports when it receives the heartbeat, then it checks the last time that it received an FBR from the DN sending the heartbeat and if it's greater than dfs.blockreport.max.deferMsec, then it returns true ("Yes, send along an FBR"). If the time-since-last-FBR is less than dfs.blockreport.max.deferMsec, then it returns false.
          Hide
          clamb Charles Lamb added a comment -

          Attached is a patch that implements the behavior I described.

          Show
          clamb Charles Lamb added a comment - Attached is a patch that implements the behavior I described.
          Hide
          clamb Charles Lamb added a comment -

          Colin P. McCabe, attached is a patch that is rebased onto the trunk.

          Show
          clamb Charles Lamb added a comment - Colin P. McCabe , attached is a patch that is rebased onto the trunk.
          Hide
          cmccabe Colin P. McCabe added a comment - - edited

          Thanks, Charles Lamb. I like this approach. It avoids sending the block report until the NN requests it. So we don't have to throw away a whole block report to achieve backpressure.

            public static final String  DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_KEY = "dfs.namenode.max.concurrent.block.reports";
            public static final int     DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_DEFAULT = Integer.MAX_VALUE;
          

          It seems like this should default to something less than the default number of RPC handler threads, not to MAX_INT. Given that dfs.namenode.handler.count = 10, it seems like this should be no more than 5 or 6, right? The main point here to avoid having the NN handler threads completely choked with block reports, and that is defeated if the value is MAX_INT. I realize that you probably intended this to be configured. But it seems like we should have a reasonable default that works for most people.

          --- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
          +++ hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
          @@ -195,6 +195,7 @@ message HeartbeatRequestProto {
             optional uint64 cacheCapacity = 6 [ default = 0 ];
             optional uint64 cacheUsed = 7 [default = 0 ];
             optional VolumeFailureSummaryProto volumeFailureSummary = 8;
          +  optional bool requestSendFullBlockReport = 9;
           }
          

          Let's have a [default = false] here so that we don't have to add a bunch of clunky HasFoo checks. Unless there is something we'd like to do differently in the "false" and "not present" cases, but I can't think of what that would be.

          +  /* Number of block reports currently being processed. */
          +  private final AtomicInteger blockReportProcessingCount = new AtomicInteger(0);
          

          I'm not sure an AtomicInteger makes sense here. We only modify this variable (write to it) when holding the FSN lock in write mode, right? And we only read from it when holding the FSN in read mode. So, there isn't any need to add atomic ops.

          +      boolean okToSendFullBlockReport = true;
          +      if (requestSendFullBlockReport &&
          +          blockManager.getBlockReportProcessingCount() >=
          +              maxConcurrentBlockReports) {
          +        /* See if we should tell DN to back off for a bit. */
          +        final long lastBlockReportTime = blockManager.getDatanodeManager().
          +            getDatanode(nodeReg).getLastBlockReportTime();
          +        if (lastBlockReportTime > 0) {
          +          /* We've received at least one block report. */
          +          final long msSinceLastBlockReport = now() - lastBlockReportTime;
          +          if (msSinceLastBlockReport < maxBlockReportDeferralMsec) {
          +            /* It hasn't been long enough to allow a BR to pass through. */
          +            okToSendFullBlockReport = false;
          +          }
          +        }
          +      }
          +      return new HeartbeatResponse(cmds, haState, rollingUpgradeInfo,
          +          okToSendFullBlockReport);
          

          There is a TOCTOU (time of check, time of use) race condition here, right? 1000 datanodes come in and ask me whether it's ok to send an FBR. In each case, I check the number of ongoing FBRs, which is 0, and say "yes." Then 1000 FBRs arrive all at once and the NN melts down.

          I think we need to track which datanodes we gave the "green light" to, and not decrement the counter until they either send that report, or some timeout expires. (We need the timeout in case datanodes go away after requesting permission-to-send.) The timeout can probably be as short as a few minutes. If you can't manage to send an FBR in a few minutes, there's more problems going on.

            public static final String  DFS_BLOCKREPORT_MAX_DEFER_MSEC_KEY = "dfs.blockreport.max.deferMsec";
            public static final long    DFS_BLOCKREPORT_MAX_DEFER_MSEC_DEFAULT = Long.MAX_VALUE;
          

          Do we really need this config key? It seems like we added it because we wanted to avoid starvation (i.e. the case where a given DN never gets given the green light). But we are maintaining the last FBR time for each DN anyway. Surely we can just have a TreeMap or something and just tell the guys with the oldest lastSentTime to go. There aren't an infinite number of datanodes in the cluster, so eventually everyone will get the green light.

          I really would prefer not to have this tunable at all, since I think it's unnecessary. In any case, it's certainly doing us no good as MAX_INT64.

          Show
          cmccabe Colin P. McCabe added a comment - - edited Thanks, Charles Lamb . I like this approach. It avoids sending the block report until the NN requests it. So we don't have to throw away a whole block report to achieve backpressure. public static final String DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_KEY = "dfs.namenode.max.concurrent.block.reports" ; public static final int DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_DEFAULT = Integer .MAX_VALUE; It seems like this should default to something less than the default number of RPC handler threads, not to MAX_INT. Given that dfs.namenode.handler.count = 10, it seems like this should be no more than 5 or 6, right? The main point here to avoid having the NN handler threads completely choked with block reports, and that is defeated if the value is MAX_INT. I realize that you probably intended this to be configured. But it seems like we should have a reasonable default that works for most people. --- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto +++ hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto @@ -195,6 +195,7 @@ message HeartbeatRequestProto { optional uint64 cacheCapacity = 6 [ default = 0 ]; optional uint64 cacheUsed = 7 [ default = 0 ]; optional VolumeFailureSummaryProto volumeFailureSummary = 8; + optional bool requestSendFullBlockReport = 9; } Let's have a [default = false] here so that we don't have to add a bunch of clunky HasFoo checks. Unless there is something we'd like to do differently in the "false" and "not present" cases, but I can't think of what that would be. + /* Number of block reports currently being processed. */ + private final AtomicInteger blockReportProcessingCount = new AtomicInteger(0); I'm not sure an AtomicInteger makes sense here. We only modify this variable (write to it) when holding the FSN lock in write mode, right? And we only read from it when holding the FSN in read mode. So, there isn't any need to add atomic ops. + boolean okToSendFullBlockReport = true ; + if (requestSendFullBlockReport && + blockManager.getBlockReportProcessingCount() >= + maxConcurrentBlockReports) { + /* See if we should tell DN to back off for a bit. */ + final long lastBlockReportTime = blockManager.getDatanodeManager(). + getDatanode(nodeReg).getLastBlockReportTime(); + if (lastBlockReportTime > 0) { + /* We've received at least one block report. */ + final long msSinceLastBlockReport = now() - lastBlockReportTime; + if (msSinceLastBlockReport < maxBlockReportDeferralMsec) { + /* It hasn't been long enough to allow a BR to pass through. */ + okToSendFullBlockReport = false ; + } + } + } + return new HeartbeatResponse(cmds, haState, rollingUpgradeInfo, + okToSendFullBlockReport); There is a TOCTOU (time of check, time of use) race condition here, right? 1000 datanodes come in and ask me whether it's ok to send an FBR. In each case, I check the number of ongoing FBRs, which is 0, and say "yes." Then 1000 FBRs arrive all at once and the NN melts down. I think we need to track which datanodes we gave the "green light" to, and not decrement the counter until they either send that report, or some timeout expires. (We need the timeout in case datanodes go away after requesting permission-to-send.) The timeout can probably be as short as a few minutes. If you can't manage to send an FBR in a few minutes, there's more problems going on. public static final String DFS_BLOCKREPORT_MAX_DEFER_MSEC_KEY = "dfs.blockreport.max.deferMsec" ; public static final long DFS_BLOCKREPORT_MAX_DEFER_MSEC_DEFAULT = Long .MAX_VALUE; Do we really need this config key? It seems like we added it because we wanted to avoid starvation (i.e. the case where a given DN never gets given the green light). But we are maintaining the last FBR time for each DN anyway. Surely we can just have a TreeMap or something and just tell the guys with the oldest lastSentTime to go. There aren't an infinite number of datanodes in the cluster, so eventually everyone will get the green light. I really would prefer not to have this tunable at all, since I think it's unnecessary. In any case, it's certainly doing us no good as MAX_INT64.
          Hide
          clamb Charles Lamb added a comment -

          Thanks for the review and comments Colin P. McCabe.

            public static final String  DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_KEY = "dfs.namenode.max.concurrent.block.reports";
            public static final int     DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_DEFAULT = Integer.MAX_VALUE;
          

          It seems like this should default to something less than the default number of RPC handler threads, not to MAX_INT. Given that dfs.namenode.handler.count = 10, it seems like this should be no more than 5 or 6, right? The main point here to avoid having the NN handler threads completely choked with block reports, and that is defeated if the value is MAX_INT. I realize that you probably intended this to be configured. But it seems like we should have a reasonable default that works for most people.

          Actually, my intent was to not have this feature kick in unless it was configured, but you have said that you want it enabled by default. I've changed the default to the above setting to 6.

          +  /* Number of block reports currently being processed. */
          +  private final AtomicInteger blockReportProcessingCount = new AtomicInteger(0);
          

          I'm not sure an AtomicInteger makes sense here. We only modify this variable (write to it) when holding the FSN lock in write mode, right? And we only read from it when holding the FSN in read mode. So, there isn't any need to add atomic ops.

          Actually, it is incr'd outside the FSN lock, otherwise it could never be > 1.

          I think we need to track which datanodes we gave the "green light" to, and not decrement the counter until they either send that report, or some timeout expires. (We need the timeout in case datanodes go away after requesting permission-to-send.) The timeout can probably be as short as a few minutes. If you can't manage to send an FBR in a few minutes, there's more problems going on.

          I've added a map called 'pendingBlockReports' to BlockManager to track the datanodes that we've given the "ok" to as well as when we gave it to them. There's also a method to clean the table.

            public static final String  DFS_BLOCKREPORT_MAX_DEFER_MSEC_KEY = "dfs.blockreport.max.deferMsec";
            public static final long    DFS_BLOCKREPORT_MAX_DEFER_MSEC_DEFAULT = Long.MAX_VALUE;
          

          Do we really need this config key?

          I've added a TreeBidiMap called lastBlockReportTime to track this. I would have used guava instead of apache.commons.collections, but Guava doesn't have a sorted BidiMap.

          Show
          clamb Charles Lamb added a comment - Thanks for the review and comments Colin P. McCabe . public static final String DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_KEY = "dfs.namenode.max.concurrent.block.reports" ; public static final int DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_DEFAULT = Integer .MAX_VALUE; It seems like this should default to something less than the default number of RPC handler threads, not to MAX_INT. Given that dfs.namenode.handler.count = 10, it seems like this should be no more than 5 or 6, right? The main point here to avoid having the NN handler threads completely choked with block reports, and that is defeated if the value is MAX_INT. I realize that you probably intended this to be configured. But it seems like we should have a reasonable default that works for most people. Actually, my intent was to not have this feature kick in unless it was configured, but you have said that you want it enabled by default. I've changed the default to the above setting to 6. + /* Number of block reports currently being processed. */ + private final AtomicInteger blockReportProcessingCount = new AtomicInteger(0); I'm not sure an AtomicInteger makes sense here. We only modify this variable (write to it) when holding the FSN lock in write mode, right? And we only read from it when holding the FSN in read mode. So, there isn't any need to add atomic ops. Actually, it is incr'd outside the FSN lock, otherwise it could never be > 1. I think we need to track which datanodes we gave the "green light" to, and not decrement the counter until they either send that report, or some timeout expires. (We need the timeout in case datanodes go away after requesting permission-to-send.) The timeout can probably be as short as a few minutes. If you can't manage to send an FBR in a few minutes, there's more problems going on. I've added a map called 'pendingBlockReports' to BlockManager to track the datanodes that we've given the "ok" to as well as when we gave it to them. There's also a method to clean the table. public static final String DFS_BLOCKREPORT_MAX_DEFER_MSEC_KEY = "dfs.blockreport.max.deferMsec" ; public static final long DFS_BLOCKREPORT_MAX_DEFER_MSEC_DEFAULT = Long .MAX_VALUE; Do we really need this config key? I've added a TreeBidiMap called lastBlockReportTime to track this. I would have used guava instead of apache.commons.collections, but Guava doesn't have a sorted BidiMap.
          Hide
          cmccabe Colin P. McCabe added a comment -

          I posted a new patch with some changes from the previous approach. Since Datanodes can go away at any time after the NN gives them the green light, this patch adds the concept of leases for block reports. Leases have a fixed time length... if the DN can't send its block report within that time, it loses the lease. I also added a new fault injection framework to monitor what is going on in the BlockManager. There was some milliseconds / seconds confusion in the existing initial block report delay code that I fixed (might want to split this off into a separate JIRA...)

          Show
          cmccabe Colin P. McCabe added a comment - I posted a new patch with some changes from the previous approach. Since Datanodes can go away at any time after the NN gives them the green light, this patch adds the concept of leases for block reports. Leases have a fixed time length... if the DN can't send its block report within that time, it loses the lease. I also added a new fault injection framework to monitor what is going on in the BlockManager. There was some milliseconds / seconds confusion in the existing initial block report delay code that I fixed (might want to split this off into a separate JIRA...)
          Hide
          andrew.wang Andrew Wang added a comment -

          Neato patch Colin, this is a high-ish level review, I probably need to do another pass.

          Small stuff:

          • Missing config key documentation in hdfs-defaults.xml
          • requestBlockReportLeaseId: empty catch for unregistered node, we could add some more informative logging rather than relying on the warn below

          BlockReportLeaseManager

          • I discussed the NodeData structure with Colin offline, wondering why we didn't use a standard Collection. Colin brought up the reason of reducing garbage, which seems valid. I think we should consider implementing IntrusiveCollection though rather than writing another.
          • I also asked about putting NodeData into DatanodeDescriptor. Not sure what the conclusion was on this, it might reduce garbage since we don't need a separate NodeData object.
          • I prefer Precondition checks for invalid configuration values at startup, so there aren't any surprises for the user. Not everyone reads the messages on startup.
          • requestLease has a check for isTraceEnabled, then logs at debug level

          BPServiceActor:

          • In offerService, we ignore the new leaseID if we already have one. On the NN though, a new request wipes out the old leaseID, and processReport checks based on leaseID rather than node. This kind of bug makes me wonder why we really need the leaseID at all, why not just attach a boolean to the node? Or if it's in the deferred vs. pending list?
          • Can we fix the javadoc for scheduleBlockReport to mention randomness, and not "send...at the next heartbeat?" Incorrect right now.
          • Have you thought about moving the BR scheduler to the NN side? We still rely on the DNs to jitter themselves and do the initial delay, but we could have the NN handle all this. This would also let the NN trigger FBRs whenever it wants. We could also do better than random scheduling, i.e. stride it rather than jitter. Incompatible, so we probably won't, but fun to think about
          • scheduleBlockReport(long) do we want to add a checkArgument that delayMs is geq 0? You nixed the else case.

          DatanodeManager:

          • Could we do the BRLManager register/unregister in addDatanode and removeDatanode? I think this is safe, since on a DN restart it'll provide a lease ID of 0 and FBR, even without a reg/unreg.
          Show
          andrew.wang Andrew Wang added a comment - Neato patch Colin, this is a high-ish level review, I probably need to do another pass. Small stuff: Missing config key documentation in hdfs-defaults.xml requestBlockReportLeaseId: empty catch for unregistered node, we could add some more informative logging rather than relying on the warn below BlockReportLeaseManager I discussed the NodeData structure with Colin offline, wondering why we didn't use a standard Collection. Colin brought up the reason of reducing garbage, which seems valid. I think we should consider implementing IntrusiveCollection though rather than writing another. I also asked about putting NodeData into DatanodeDescriptor. Not sure what the conclusion was on this, it might reduce garbage since we don't need a separate NodeData object. I prefer Precondition checks for invalid configuration values at startup, so there aren't any surprises for the user. Not everyone reads the messages on startup. requestLease has a check for isTraceEnabled, then logs at debug level BPServiceActor: In offerService, we ignore the new leaseID if we already have one. On the NN though, a new request wipes out the old leaseID, and processReport checks based on leaseID rather than node. This kind of bug makes me wonder why we really need the leaseID at all, why not just attach a boolean to the node? Or if it's in the deferred vs. pending list? Can we fix the javadoc for scheduleBlockReport to mention randomness, and not "send...at the next heartbeat?" Incorrect right now. Have you thought about moving the BR scheduler to the NN side? We still rely on the DNs to jitter themselves and do the initial delay, but we could have the NN handle all this. This would also let the NN trigger FBRs whenever it wants. We could also do better than random scheduling, i.e. stride it rather than jitter. Incompatible, so we probably won't, but fun to think about scheduleBlockReport(long) do we want to add a checkArgument that delayMs is geq 0? You nixed the else case. DatanodeManager: Could we do the BRLManager register/unregister in addDatanode and removeDatanode? I think this is safe, since on a DN restart it'll provide a lease ID of 0 and FBR, even without a reg/unreg.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Missing config key documentation in hdfs-defaults.xml

          added

          requestBlockReportLeaseId: empty catch for unregistered node, we could add some more informative logging rather than relying on the warn below

          added

          I discussed the NodeData structure with Colin offline, wondering why we didn't use a standard Collection. Colin brought up the reason of reducing garbage, which seems valid. I think we should consider implementing IntrusiveCollection though rather than writing another.

          yes, there will be quite a few of these requests coming in at any given point. IntrusiveCollection is an interface rather than an implementation, so I don't think it would help here (it's most useful when an element needs to be in multiple lists at once, and when you need fancy operations like finding the list from the element)

          I also asked about putting NodeData into DatanodeDescriptor. Not sure what the conclusion was on this, it might reduce garbage since we don't need a separate NodeData object.

          The locking is easier to understand if all the lease data is inside BlockReportLeaseManager.

          I prefer Precondition checks for invalid configuration values at startup, so there aren't any surprises for the user. Not everyone reads the messages on startup.

          ok

          requestLease has a check for isTraceEnabled, then logs at debug level

          fixed

          In offerService, we ignore the new leaseID if we already have one. On the NN though, a new request wipes out the old leaseID, and processReport checks based on leaseID rather than node. This kind of bug makes me wonder why we really need the leaseID at all, why not just attach a boolean to the node? Or if it's in the deferred vs. pending list?

          It's safer for the NameNode to wipe the old lease ID every time there is a new request. It avoids problems where the DN went down while holding a lease, and then came back up. We could potentially also avoid those problems by being very careful with node (un)registration, but why make things more complicated than they need to be? I do think that the DN should overwrite its old lease ID if the NN gives it a new one, for the same reason. Let me change it to do that... Of course this code path should never happen since the NN should never give a new lease ID when none was requested. So calling this a "bug" seems like a bit of a stretch.

          I prefer IDs to simply checking against the datanode UUID, because lease IDs allow us to match up the NN granting a lease with the DN accepting and using it, which is very useful for debugging or understanding what is happening in production. It also makes it very obvious whether a DN is "cheating" by sending a block report with leaseID = 0 to disable rate-limiting. This is a use-case we want to support but we also want to know when it is going on.

          Can we fix the javadoc for scheduleBlockReport to mention randomness, and not "send...at the next heartbeat?" Incorrect right now.

          I looked pretty far back into the history of this code. It seems to go back to at least 2009. The underlying ideas seem to be:
          1. the first full block report can have a configurable delay in seconds expressed by dfs.blockreport.initialDelay
          2. the second full block report gets a random delay between 0 and dfs.blockreport.intervalMsec
          3. all other block reports get an interval of dfs.blockreport.intervalMsec unless the previous block report had a longer interval than expected... if the previous one had a longer interval than expected, the next one gets a shorter interval.

          We can keep behavior #1... it's simple to implement and may be useful for testing (although I think this patch makes it no longer necessary).

          Behavior #2 seems like a workaround for the lack of congestion control in the past. In a world where the NN rate-limits full block reports, we don't need this behavior to prevent FBRs from "clumping". They will just naturally not overly clump because we are rate-limiting them.

          Behavior #3 just seems incorrect, even without this patch. By definition, a full block report contains all the information the NN needs to understand the DN state. Just because block report interval N was longer than expected, seems no reason to shorten block report interval N+1. In fact, this behavior seems like it could lead to congestion collapse... if the NN gets overloaded and can't handle block reports for some time, a bunch of DNs will shorten the time in between the current block report and the next one, further increasing total NN load. Not good. Not good at all.

          I replaced this with a simple "randomize first block report time within 0 and dfs.blockreport.initialDelay, then try to do all other block reports after dfs.blockreport.intervalMsec ms. If the full block report interval was more than 2x what was configured, we whine about it in the log file (should only happen if the NN is under extreme load).

          Have you thought about moving the BR scheduler to the NN side? We still rely on the DNs to jitter themselves and do the initial delay, but we could have the NN handle all this. This would also let the NN trigger FBRs whenever it wants. We could also do better than random scheduling, i.e. stride it rather than jitter. Incompatible, so we probably won't, but fun to think about

          Yeah, we could do more on the NN to ensure fairness and so forth. I think it's not really a big problem since datanodes aren't evil, and the existing method of configuring BR period on the datanode side seems to be working well. We also tend to assume uniform cluster load in HDFS, an assumption which makes complex BR scheduling less interesting. But maybe some day...

          Could we do the BRLManager register/unregister in addDatanode and removeDatanode? I think this is safe, since on a DN restart it'll provide a lease ID of 0 and FBR, even without a reg/unreg.

          Seems reasonable. I moved the BRLM register/unregister calls from registerDatanode into addDatanode / removeDatanode.

          Show
          cmccabe Colin P. McCabe added a comment - Missing config key documentation in hdfs-defaults.xml added requestBlockReportLeaseId: empty catch for unregistered node, we could add some more informative logging rather than relying on the warn below added I discussed the NodeData structure with Colin offline, wondering why we didn't use a standard Collection. Colin brought up the reason of reducing garbage, which seems valid. I think we should consider implementing IntrusiveCollection though rather than writing another. yes, there will be quite a few of these requests coming in at any given point. IntrusiveCollection is an interface rather than an implementation, so I don't think it would help here (it's most useful when an element needs to be in multiple lists at once, and when you need fancy operations like finding the list from the element) I also asked about putting NodeData into DatanodeDescriptor. Not sure what the conclusion was on this, it might reduce garbage since we don't need a separate NodeData object. The locking is easier to understand if all the lease data is inside BlockReportLeaseManager . I prefer Precondition checks for invalid configuration values at startup, so there aren't any surprises for the user. Not everyone reads the messages on startup. ok requestLease has a check for isTraceEnabled, then logs at debug level fixed In offerService, we ignore the new leaseID if we already have one. On the NN though, a new request wipes out the old leaseID, and processReport checks based on leaseID rather than node. This kind of bug makes me wonder why we really need the leaseID at all, why not just attach a boolean to the node? Or if it's in the deferred vs. pending list? It's safer for the NameNode to wipe the old lease ID every time there is a new request. It avoids problems where the DN went down while holding a lease, and then came back up. We could potentially also avoid those problems by being very careful with node (un)registration, but why make things more complicated than they need to be? I do think that the DN should overwrite its old lease ID if the NN gives it a new one, for the same reason. Let me change it to do that... Of course this code path should never happen since the NN should never give a new lease ID when none was requested. So calling this a "bug" seems like a bit of a stretch. I prefer IDs to simply checking against the datanode UUID, because lease IDs allow us to match up the NN granting a lease with the DN accepting and using it, which is very useful for debugging or understanding what is happening in production. It also makes it very obvious whether a DN is "cheating" by sending a block report with leaseID = 0 to disable rate-limiting. This is a use-case we want to support but we also want to know when it is going on. Can we fix the javadoc for scheduleBlockReport to mention randomness, and not "send...at the next heartbeat?" Incorrect right now. I looked pretty far back into the history of this code. It seems to go back to at least 2009. The underlying ideas seem to be: 1. the first full block report can have a configurable delay in seconds expressed by dfs.blockreport.initialDelay 2. the second full block report gets a random delay between 0 and dfs.blockreport.intervalMsec 3. all other block reports get an interval of dfs.blockreport.intervalMsec unless the previous block report had a longer interval than expected... if the previous one had a longer interval than expected, the next one gets a shorter interval. We can keep behavior #1... it's simple to implement and may be useful for testing (although I think this patch makes it no longer necessary). Behavior #2 seems like a workaround for the lack of congestion control in the past. In a world where the NN rate-limits full block reports, we don't need this behavior to prevent FBRs from "clumping". They will just naturally not overly clump because we are rate-limiting them. Behavior #3 just seems incorrect, even without this patch. By definition, a full block report contains all the information the NN needs to understand the DN state. Just because block report interval N was longer than expected, seems no reason to shorten block report interval N+1. In fact, this behavior seems like it could lead to congestion collapse... if the NN gets overloaded and can't handle block reports for some time, a bunch of DNs will shorten the time in between the current block report and the next one, further increasing total NN load. Not good. Not good at all. I replaced this with a simple "randomize first block report time within 0 and dfs.blockreport.initialDelay , then try to do all other block reports after dfs.blockreport.intervalMsec ms. If the full block report interval was more than 2x what was configured, we whine about it in the log file (should only happen if the NN is under extreme load). Have you thought about moving the BR scheduler to the NN side? We still rely on the DNs to jitter themselves and do the initial delay, but we could have the NN handle all this. This would also let the NN trigger FBRs whenever it wants. We could also do better than random scheduling, i.e. stride it rather than jitter. Incompatible, so we probably won't, but fun to think about Yeah, we could do more on the NN to ensure fairness and so forth. I think it's not really a big problem since datanodes aren't evil, and the existing method of configuring BR period on the datanode side seems to be working well. We also tend to assume uniform cluster load in HDFS, an assumption which makes complex BR scheduling less interesting. But maybe some day... Could we do the BRLManager register/unregister in addDatanode and removeDatanode? I think this is safe, since on a DN restart it'll provide a lease ID of 0 and FBR, even without a reg/unreg. Seems reasonable. I moved the BRLM register/unregister calls from registerDatanode into addDatanode / removeDatanode.
          Hide
          andrew.wang Andrew Wang added a comment -

          Nits:

          • Should the checkLease logs be done to the blockLog? We log the startup error log there in processReport
          • Update javadoc in BlockReportContext with what leaseID is for.
          • Add something to the log message about overwriting the old leaseID in offerService. Agree that this shouldn't really trigger, but good defensive coding practice
          • DatanodeManager, there's still a register/unregister in registerDatanode I think we could skip. This is the node restart case where it's registered previously.
          • BRLManager requestLease, we auto-register the node on requestLease. This shouldn't happen since DNs need to register before doing anything else. We can keep this here
          • Still need documentation of new config keys in hdfs-default.xml

          Block report scheduling:

          • We removed TestBPSAScheduler#testScheduleBlockReportImmediate, should this swap over to testing forceFullBlockReport?
          • Extra import in TestBPSAScheduler and BPSA
          • I'm worried about convoy effects if we don't stick to the stride system of the old code. I think of the old code as follows:
          1. Choose a random time within the "initialDelay" interval to jitter
          2. Attempt to block report at that same time every hour.

          This keeps the BRs from all the DNs spread out, even if the NN gets temporarily backed up. Once the NN catches up and flushes its backlog of FBRs, future BRs will still be nicely spread out.

          My understanding of your new scheme is that after a DN successfully BRs, it'll BR again an hour afterwards. So, if all the BRs piled up and then are processed in quick succession, all the DNs will BR at about the same time next hour. Since we want to spread the BRs out across the hour, this is not good.

          Other ideas are to round up to the next stride. Or, wait an interval plus a random delay. We might consider some congestion control too, where the DNs backoff linearly or exponentially. All these schemes delay the FBRs, but maybe we trust IBRs enough now.

          If you want to pursue this logic change more, let's split it out into a follow-on JIRA. The rest LGTM, +1 pending above comments.

          Show
          andrew.wang Andrew Wang added a comment - Nits: Should the checkLease logs be done to the blockLog? We log the startup error log there in processReport Update javadoc in BlockReportContext with what leaseID is for. Add something to the log message about overwriting the old leaseID in offerService. Agree that this shouldn't really trigger, but good defensive coding practice DatanodeManager, there's still a register/unregister in registerDatanode I think we could skip. This is the node restart case where it's registered previously. BRLManager requestLease, we auto-register the node on requestLease. This shouldn't happen since DNs need to register before doing anything else. We can keep this here Still need documentation of new config keys in hdfs-default.xml Block report scheduling: We removed TestBPSAScheduler#testScheduleBlockReportImmediate, should this swap over to testing forceFullBlockReport? Extra import in TestBPSAScheduler and BPSA I'm worried about convoy effects if we don't stick to the stride system of the old code. I think of the old code as follows: Choose a random time within the "initialDelay" interval to jitter Attempt to block report at that same time every hour. This keeps the BRs from all the DNs spread out, even if the NN gets temporarily backed up. Once the NN catches up and flushes its backlog of FBRs, future BRs will still be nicely spread out. My understanding of your new scheme is that after a DN successfully BRs, it'll BR again an hour afterwards. So, if all the BRs piled up and then are processed in quick succession, all the DNs will BR at about the same time next hour. Since we want to spread the BRs out across the hour, this is not good. Other ideas are to round up to the next stride. Or, wait an interval plus a random delay. We might consider some congestion control too, where the DNs backoff linearly or exponentially. All these schemes delay the FBRs, but maybe we trust IBRs enough now. If you want to pursue this logic change more, let's split it out into a follow-on JIRA. The rest LGTM, +1 pending above comments.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 18m 13s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 15 new or modified test files.
          +1 javac 9m 5s There were no new javac warning messages.
          +1 javadoc 11m 19s There were no new javadoc warning messages.
          +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 2m 36s The applied patch generated 25 new checkstyle issues (total was 1365, now 1380).
          -1 whitespace 0m 9s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 53s mvn install still works.
          +1 eclipse:eclipse 0m 40s The patch built with eclipse:eclipse.
          +1 findbugs 3m 43s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 native 3m 35s Pre-build of native portion
          -1 hdfs tests 100m 55s Tests failed in hadoop-hdfs.
              152m 39s  



          Reason Tests
          Failed unit tests hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
            hadoop.hdfs.TestSetrepDecreasing
          Timed out tests org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12736276/HDFS-7923.004.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 6aec13c
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/whitespace.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11169/testReport/
          Java 1.7.0_55
          uname Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11169/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 18m 13s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 15 new or modified test files. +1 javac 9m 5s There were no new javac warning messages. +1 javadoc 11m 19s There were no new javadoc warning messages. +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 2m 36s The applied patch generated 25 new checkstyle issues (total was 1365, now 1380). -1 whitespace 0m 9s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 53s mvn install still works. +1 eclipse:eclipse 0m 40s The patch built with eclipse:eclipse. +1 findbugs 3m 43s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 35s Pre-build of native portion -1 hdfs tests 100m 55s Tests failed in hadoop-hdfs.     152m 39s   Reason Tests Failed unit tests hadoop.hdfs.server.namenode.TestNamenodeCapacityReport   hadoop.hdfs.TestSetrepDecreasing Timed out tests org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12736276/HDFS-7923.004.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 6aec13c checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/whitespace.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11169/testReport/ Java 1.7.0_55 uname Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11169/console This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Should the checkLease logs be done to the blockLog? We log the startup error log there in processReport

          I like having the ability to turn on and off TRACE logging for various subsystems. Putting everything in the blockLog would make that harder, right?

          Update javadoc in BlockReportContext with what leaseID is for.

          added

          Add something to the log message about overwriting the old leaseID in offerService. Agree that this shouldn't really trigger, but good defensive coding practice

          ok

          DatanodeManager, there's still a register/unregister in registerDatanode I think we could skip. This is the node restart case where it's registered previously.

          Good catch. The calls in removeDatanode and addDatanode should take care of this, so there's no need to have it here.

          BRLManager requestLease, we auto-register the node on requestLease. This shouldn't happen since DNs need to register before doing anything else. We can keep this here

          I added a warn message since this shouldn't happen.

          Still need documentation of new config keys in hdfs-default.xml

          added

          Extra import in TestBPSAScheduler and BPSA

          removed

          If you want to pursue [the block report timing] logic change more, let's split it out into a follow-on JIRA. The rest LGTM, +1 pending above comments.

          OK, I will restore the old behavior for now, and we can do this in a follow-on change.

          Show
          cmccabe Colin P. McCabe added a comment - Should the checkLease logs be done to the blockLog? We log the startup error log there in processReport I like having the ability to turn on and off TRACE logging for various subsystems. Putting everything in the blockLog would make that harder, right? Update javadoc in BlockReportContext with what leaseID is for. added Add something to the log message about overwriting the old leaseID in offerService. Agree that this shouldn't really trigger, but good defensive coding practice ok DatanodeManager, there's still a register/unregister in registerDatanode I think we could skip. This is the node restart case where it's registered previously. Good catch. The calls in removeDatanode and addDatanode should take care of this, so there's no need to have it here. BRLManager requestLease, we auto-register the node on requestLease. This shouldn't happen since DNs need to register before doing anything else. We can keep this here I added a warn message since this shouldn't happen. Still need documentation of new config keys in hdfs-default.xml added Extra import in TestBPSAScheduler and BPSA removed If you want to pursue [the block report timing] logic change more, let's split it out into a follow-on JIRA. The rest LGTM, +1 pending above comments. OK, I will restore the old behavior for now, and we can do this in a follow-on change.
          Hide
          cmccabe Colin P. McCabe added a comment -

          I need to make sure to treat the initial block report delay as being in seconds, not milliseconds. Thanks to Andrew for pointing this out. Updating with patch 6.

          Show
          cmccabe Colin P. McCabe added a comment - I need to make sure to treat the initial block report delay as being in seconds, not milliseconds. Thanks to Andrew for pointing this out. Updating with patch 6.
          Hide
          andrew.wang Andrew Wang added a comment -

          Thanks Colin, +1 pending Jenkins. Great work on this one.

          Show
          andrew.wang Andrew Wang added a comment - Thanks Colin, +1 pending Jenkins. Great work on this one.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 18m 2s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 15 new or modified test files.
          +1 javac 7m 31s There were no new javac warning messages.
          +1 javadoc 9m 37s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 2m 14s The applied patch generated 23 new checkstyle issues (total was 1341, now 1355).
          -1 whitespace 0m 8s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 33s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          +1 findbugs 3m 19s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 native 3m 19s Pre-build of native portion
          -1 hdfs tests 167m 15s Tests failed in hadoop-hdfs.
              213m 57s  



          Reason Tests
          Failed unit tests hadoop.hdfs.TestSafeMode
            hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
            hadoop.hdfs.server.blockmanagement.TestDatanodeManager
            hadoop.hdfs.web.TestWebHDFS
            hadoop.hdfs.server.blockmanagement.TestBlockReportRateLimiting
            hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
            hadoop.hdfs.TestSetrepDecreasing
            hadoop.hdfs.server.datanode.TestBpServiceActorScheduler
            hadoop.hdfs.server.namenode.TestNameNodeRecovery



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12738040/HDFS-7923.005.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 2dbc40e
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11247/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11247/artifact/patchprocess/whitespace.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11247/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11247/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11247/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 18m 2s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 15 new or modified test files. +1 javac 7m 31s There were no new javac warning messages. +1 javadoc 9m 37s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 2m 14s The applied patch generated 23 new checkstyle issues (total was 1341, now 1355). -1 whitespace 0m 8s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 33s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. +1 findbugs 3m 19s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 19s Pre-build of native portion -1 hdfs tests 167m 15s Tests failed in hadoop-hdfs.     213m 57s   Reason Tests Failed unit tests hadoop.hdfs.TestSafeMode   hadoop.hdfs.server.namenode.TestNamenodeCapacityReport   hadoop.hdfs.server.blockmanagement.TestDatanodeManager   hadoop.hdfs.web.TestWebHDFS   hadoop.hdfs.server.blockmanagement.TestBlockReportRateLimiting   hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication   hadoop.hdfs.TestSetrepDecreasing   hadoop.hdfs.server.datanode.TestBpServiceActorScheduler   hadoop.hdfs.server.namenode.TestNameNodeRecovery Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12738040/HDFS-7923.005.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 2dbc40e checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11247/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11247/artifact/patchprocess/whitespace.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11247/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11247/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11247/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 17m 35s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 15 new or modified test files.
          +1 javac 7m 29s There were no new javac warning messages.
          +1 javadoc 9m 37s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 2m 14s The applied patch generated 23 new checkstyle issues (total was 1342, now 1356).
          -1 whitespace 0m 8s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 32s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 3m 15s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 native 3m 14s Pre-build of native portion
          -1 hdfs tests 162m 11s Tests failed in hadoop-hdfs.
              208m 16s  



          Reason Tests
          Failed unit tests hadoop.hdfs.server.datanode.TestBpServiceActorScheduler
            hadoop.hdfs.server.blockmanagement.TestDatanodeManager
            hadoop.hdfs.server.blockmanagement.TestBlockReportRateLimiting
            hadoop.hdfs.server.namenode.TestNamenodeCapacityReport



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12738044/HDFS-7923.006.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 2dbc40e
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11248/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11248/artifact/patchprocess/whitespace.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11248/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11248/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11248/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 17m 35s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 15 new or modified test files. +1 javac 7m 29s There were no new javac warning messages. +1 javadoc 9m 37s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 2m 14s The applied patch generated 23 new checkstyle issues (total was 1342, now 1356). -1 whitespace 0m 8s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 32s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 3m 15s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 14s Pre-build of native portion -1 hdfs tests 162m 11s Tests failed in hadoop-hdfs.     208m 16s   Reason Tests Failed unit tests hadoop.hdfs.server.datanode.TestBpServiceActorScheduler   hadoop.hdfs.server.blockmanagement.TestDatanodeManager   hadoop.hdfs.server.blockmanagement.TestBlockReportRateLimiting   hadoop.hdfs.server.namenode.TestNamenodeCapacityReport Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12738044/HDFS-7923.006.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 2dbc40e checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11248/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11248/artifact/patchprocess/whitespace.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11248/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11248/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11248/console This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          New patch fixes unit test failures. TestBpServiceActorScheduler needed to be reverted to the old version (since the BR timing change was also removed from the latest version), TestDatanodeManager needed to be tweaked because of the way it uses mocks, to avoid a null pointer exception. There was also a bug where we'd get an NPE when removing elements from the internal linked lists in the BlockReportLeaseManager which was causing unit test failures.

          Show
          cmccabe Colin P. McCabe added a comment - New patch fixes unit test failures. TestBpServiceActorScheduler needed to be reverted to the old version (since the BR timing change was also removed from the latest version), TestDatanodeManager needed to be tweaked because of the way it uses mocks, to avoid a null pointer exception. There was also a bug where we'd get an NPE when removing elements from the internal linked lists in the BlockReportLeaseManager which was causing unit test failures.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 17m 43s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 16 new or modified test files.
          +1 javac 7m 36s There were no new javac warning messages.
          +1 javadoc 9m 41s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 2m 11s The applied patch generated 23 new checkstyle issues (total was 1341, now 1355).
          -1 whitespace 0m 8s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 33s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 3m 16s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 native 3m 15s Pre-build of native portion
          +1 hdfs tests 160m 55s Tests passed in hadoop-hdfs.
              207m 19s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12738456/HDFS-7923.007.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 84ba1a7
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11280/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11280/artifact/patchprocess/whitespace.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11280/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11280/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11280/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 17m 43s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 16 new or modified test files. +1 javac 7m 36s There were no new javac warning messages. +1 javadoc 9m 41s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 2m 11s The applied patch generated 23 new checkstyle issues (total was 1341, now 1355). -1 whitespace 0m 8s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 33s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 3m 16s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 15s Pre-build of native portion +1 hdfs tests 160m 55s Tests passed in hadoop-hdfs.     207m 19s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12738456/HDFS-7923.007.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 84ba1a7 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11280/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11280/artifact/patchprocess/whitespace.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11280/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11280/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11280/console This message was automatically generated.
          Hide
          andrew.wang Andrew Wang added a comment -

          +1 pending, some of the checkstyles look related though so please give them a look. Whitespace we can fix up at commit time.

          Show
          andrew.wang Andrew Wang added a comment - +1 pending, some of the checkstyles look related though so please give them a look. Whitespace we can fix up at commit time.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8011 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8011/)
          HDFS-7923. The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
            Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8011 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8011/ ) HDFS-7923 . The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          cmccabe Colin P. McCabe added a comment -

          committed. let's fix the checkstyle issues in a follow-on

          Show
          cmccabe Colin P. McCabe added a comment - committed. let's fix the checkstyle issues in a follow-on
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #227 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/227/)
          HDFS-7923. The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
            Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #227 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/227/ ) HDFS-7923 . The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #957 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/957/)
          HDFS-7923. The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
            Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #957 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/957/ ) HDFS-7923 . The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #2155 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2155/)
          HDFS-7923. The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
            Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #2155 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2155/ ) HDFS-7923 . The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #216 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/216/)
          HDFS-7923. The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
            Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #216 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/216/ ) HDFS-7923 . The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #225 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/225/)
          HDFS-7923. The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
            Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #225 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/225/ ) HDFS-7923 . The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2173 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2173/)
          HDFS-7923. The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
            Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2173 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2173/ ) HDFS-7923 . The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages (cmccabe) (cmccabe: rev 12b5b06c063d93e6c683c9b6fac9a96912f59e59) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerFaultInjector.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/HeartbeatResponse.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesBlockReportPerStorage.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/RegisterCommand.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestNNHandlesCombinedBlockReport.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockReportContext.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml Add HDFS-7923 to CHANGES.txt (cmccabe: rev 46b0b4179c1ef1a1510eb04e40b11968a24df485) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          sanjay.radia Sanjay Radia added a comment -

          How will you ensure that a particular DN does not get staved? i.e. How do you guarantee that BRs will get through. HDFS depends on periodic BRs for correctness. I recall in discussions with Facebook where they changed their HDFS for incremental BRs but still kept full BRs at a lower frequency just for safety.

          Show
          sanjay.radia Sanjay Radia added a comment - How will you ensure that a particular DN does not get staved? i.e. How do you guarantee that BRs will get through. HDFS depends on periodic BRs for correctness. I recall in discussions with Facebook where they changed their HDFS for incremental BRs but still kept full BRs at a lower frequency just for safety.
          Hide
          sureshms Suresh Srinivas added a comment -

          I just notice this. Daryn Sharp and I have talked about improving block report processing efficiency. The fact that it can take close to 300 ms is a concern. This change work around that and I see cases where Datanodes can be starved. Daryn Sharp, any comments?

          Show
          sureshms Suresh Srinivas added a comment - I just notice this. Daryn Sharp and I have talked about improving block report processing efficiency. The fact that it can take close to 300 ms is a concern. This change work around that and I see cases where Datanodes can be starved. Daryn Sharp , any comments?
          Hide
          sanjay.radia Sanjay Radia added a comment -

          Have you considered a pull model (NN pulls) which does not risk starvation?

          Show
          sanjay.radia Sanjay Radia added a comment - Have you considered a pull model (NN pulls) which does not risk starvation?
          Hide
          cmccabe Colin P. McCabe added a comment - - edited

          Starvation is not a real concern here. Imagine a 1000 node cluster where full block reports are 6 hours apart. Then the NN needs to be able to handle 2.7 full block reports a minute. If each one takes 500 ms (we'll be pessimistic), then 1.35 out of every 60 seconds is FBR time, or 2.3% of the time. If you want to be even more pessimistic and assume each block report is 1 hour apart rather than 6, just multiple that number by 6 to get 13.8% of the time.

          For starvation to happen, you'd have to be spending close to 100% of the time on full block reports. That's just not going to happen. And if it does happen, you have bigger problems, like not being able to actually do anything on the NameNode (since you're spending all your time on FBRs, which hold the FSN write lock).

          Even if you were spending close to 100% of the time on full block reports, the existing code doesn't enforce fairness... I can configure one DN to send full block reports every 30 minutes, and configure everyone else to send every 10 hours. The FBR period is a datanode-side configuration, not a NN-side one.

          This change is really helpful during startup on big clusters. In the past we have seen restarting all the DNs at once on a several hundred node cluster bring the NN to its knees. All of the RPC handlers get flooded with FBRs, but only one can make progress at once. The flood of FBRs also triggers full GCs, since we can't handle them in a timely fashion and they enter the oldgen. I realize that dfs.blockreport.initialDelay was designed as a workaround, but it is difficult to know what value to set it to, results in slower startup, and is often overlooked in real-world deployments.

          If we want to work on enforcing fairness on the NN-side, we can do that, but it seems unrelated to this change to me. It's also not something we currently do, so it would be nice to see data showing that it was helpful.

          Show
          cmccabe Colin P. McCabe added a comment - - edited Starvation is not a real concern here. Imagine a 1000 node cluster where full block reports are 6 hours apart. Then the NN needs to be able to handle 2.7 full block reports a minute. If each one takes 500 ms (we'll be pessimistic), then 1.35 out of every 60 seconds is FBR time, or 2.3% of the time. If you want to be even more pessimistic and assume each block report is 1 hour apart rather than 6, just multiple that number by 6 to get 13.8% of the time. For starvation to happen, you'd have to be spending close to 100% of the time on full block reports. That's just not going to happen. And if it does happen, you have bigger problems, like not being able to actually do anything on the NameNode (since you're spending all your time on FBRs, which hold the FSN write lock). Even if you were spending close to 100% of the time on full block reports, the existing code doesn't enforce fairness... I can configure one DN to send full block reports every 30 minutes, and configure everyone else to send every 10 hours. The FBR period is a datanode-side configuration, not a NN-side one. This change is really helpful during startup on big clusters. In the past we have seen restarting all the DNs at once on a several hundred node cluster bring the NN to its knees. All of the RPC handlers get flooded with FBRs, but only one can make progress at once. The flood of FBRs also triggers full GCs, since we can't handle them in a timely fashion and they enter the oldgen. I realize that dfs.blockreport.initialDelay was designed as a workaround, but it is difficult to know what value to set it to, results in slower startup, and is often overlooked in real-world deployments. If we want to work on enforcing fairness on the NN-side, we can do that, but it seems unrelated to this change to me. It's also not something we currently do, so it would be nice to see data showing that it was helpful.
          Hide
          sanjay.radia Sanjay Radia added a comment -

          This change is really helpful during startup on big clusters. In the past we have seen restarting all the DNs at once on a several hundred node cluster bring the NN to its knees.

          There is already a random backoff for the initial block report. You can configure the initial BR backoff time. When that jira was done there was a proposal to give each DN a different backoff time depending on the number of outstanding BRs; this enhancement was not done at that time because this backoff worked very well. For a several hundred node cluster the initial BR backoff time should be approx 60sec.

          Show
          sanjay.radia Sanjay Radia added a comment - This change is really helpful during startup on big clusters. In the past we have seen restarting all the DNs at once on a several hundred node cluster bring the NN to its knees. There is already a random backoff for the initial block report. You can configure the initial BR backoff time. When that jira was done there was a proposal to give each DN a different backoff time depending on the number of outstanding BRs; this enhancement was not done at that time because this backoff worked very well. For a several hundred node cluster the initial BR backoff time should be approx 60sec.
          Hide
          cmccabe Colin P. McCabe added a comment -

          BTW it would be relatively easy to enforce "fairness" on the NN side in this patch. In fact I had an earlier version that did do that. It would give out FBR leases to datanodes based on a round-robin order. But I decided against it since it was extra complexity without a good reason. In particular, I was worried that I'd create starvation if the set of DNs the NN selected to report in didn't want to report in (perhaps because they were using initialDelay...). Since full block report periods can be very long this is a real concern and would necessitate a hack like a timeout. It was much easier to keep block report scheduling as a datanode-side thing.

          We can revisit this if it ever seems like a good idea, but I bet that you will not be able to create starvation in any real cluster without creating conditions that would make the cluster unusable no matter what.

          Show
          cmccabe Colin P. McCabe added a comment - BTW it would be relatively easy to enforce "fairness" on the NN side in this patch. In fact I had an earlier version that did do that. It would give out FBR leases to datanodes based on a round-robin order. But I decided against it since it was extra complexity without a good reason. In particular, I was worried that I'd create starvation if the set of DNs the NN selected to report in didn't want to report in (perhaps because they were using initialDelay...). Since full block report periods can be very long this is a real concern and would necessitate a hack like a timeout. It was much easier to keep block report scheduling as a datanode-side thing. We can revisit this if it ever seems like a good idea, but I bet that you will not be able to create starvation in any real cluster without creating conditions that would make the cluster unusable no matter what.
          Hide
          andrew.wang Andrew Wang added a comment -

          If we're really worried about starvation, we could add a failsafe to the DN side that sets a lease ID of 0 after say, three FBR intervals. This will skip the rate limiting.

          Overall though I agree with Colin, we get starvation when a NN remains continually fully loaded with FBR work from other nodes. Such a cluster would be basically unusable for real work. The default # concurrent FBRs is also >1, so we should be gated on NN CPU rather than the much slower heartbeat interval.

          Show
          andrew.wang Andrew Wang added a comment - If we're really worried about starvation, we could add a failsafe to the DN side that sets a lease ID of 0 after say, three FBR intervals. This will skip the rate limiting. Overall though I agree with Colin, we get starvation when a NN remains continually fully loaded with FBR work from other nodes. Such a cluster would be basically unusable for real work. The default # concurrent FBRs is also >1, so we should be gated on NN CPU rather than the much slower heartbeat interval.
          Hide
          cmccabe Colin P. McCabe added a comment -

          This change is important for avoiding cascading failures (aka congestion collapse.) Currently when the NN gets too many full block reports at once, the extra block reports slow down the processing of the existing ones (because storing the large RPCs generates GC activity up to and including full GCs). So you get into a negative spiral-- can't process FBRs fast enough? Then have some more FBRs which will slow you down even more. And so on. Keep in mind with the previous code, the DN would send its full block report all over again if the NN didn't respond within some timeout, which could lead to the NN having multiple (large) copies of the same full block report queued up. It's true that you could usually avoid these scenarios by careful configuration and tuning, but this kind of fragile congestion collapse behavior should not be in the system. This change is also important for maintaining any sort of reasonable quality of service on the NN, since otherwise we can get completely flooded with FBRs and can't do any other work.

          Show
          cmccabe Colin P. McCabe added a comment - This change is important for avoiding cascading failures (aka congestion collapse.) Currently when the NN gets too many full block reports at once, the extra block reports slow down the processing of the existing ones (because storing the large RPCs generates GC activity up to and including full GCs). So you get into a negative spiral-- can't process FBRs fast enough? Then have some more FBRs which will slow you down even more. And so on. Keep in mind with the previous code, the DN would send its full block report all over again if the NN didn't respond within some timeout, which could lead to the NN having multiple (large) copies of the same full block report queued up. It's true that you could usually avoid these scenarios by careful configuration and tuning, but this kind of fragile congestion collapse behavior should not be in the system. This change is also important for maintaining any sort of reasonable quality of service on the NN, since otherwise we can get completely flooded with FBRs and can't do any other work.
          Hide
          sanjay.radia Sanjay Radia added a comment -

          Starvation .... I didn't literally mean starvation but was more concerned about fairness and about safety. Can a DN's block report be delayed for some significant period of time or due to subtle bug even long times. Our current implementation is very resilient - DNs just sent the BRs at a specific period irrespective of the NN. Does your design have a safety net - say a DN will wait a max of 2 periods to get permission (or something like that).

          Show
          sanjay.radia Sanjay Radia added a comment - Starvation .... I didn't literally mean starvation but was more concerned about fairness and about safety. Can a DN's block report be delayed for some significant period of time or due to subtle bug even long times. Our current implementation is very resilient - DNs just sent the BRs at a specific period irrespective of the NN. Does your design have a safety net - say a DN will wait a max of 2 periods to get permission (or something like that).
          Hide
          cmccabe Colin P. McCabe added a comment -

          Can a DN's block report be delayed for some significant period of time or due to subtle bug even long times

          So it's important to distinguish between sending block reports and processing block reports. This patch delays sending block reports, but it should not delay processing block reports by any significant amount. The idea is that in general sending a bunch of block reports that can't be processed until much later is bad (for the reasons discussed above like GC problems, lack of RPC handler threads, memory consumption, etc.) But the patch should keep the FBRs flowing pretty regularly... we will still queue up 6 of them on the NN even though we can only process 1 at once.

          Does your design have a safety net - say a DN will wait a max of 2 periods to get permission (or something like that).

          This is kind of like a traffic light, right? If the traffic light is red for a long time, there must be a problem somewhere else in the system. But the solution can't be to slam on the accelerator when the red light lasts too long. You'll just crash, especially in a traffic jam.

          Maybe car analogies are taking it too far, but hopefully you can see what I'm saying. I think sending FBRs when the system is not ready for them is a really bad behavior. It leads to congestion collapse, which is much worse than starving a few DNs for a while.

          Hmm. What if we had a metric which was the average length of time the DN had to wait before sending a full block report that it wanted to send? Management systems could follow this metric and raise an alert when the time got too high. Then the admin can do something to solve the problem.

          Show
          cmccabe Colin P. McCabe added a comment - Can a DN's block report be delayed for some significant period of time or due to subtle bug even long times So it's important to distinguish between sending block reports and processing block reports. This patch delays sending block reports, but it should not delay processing block reports by any significant amount. The idea is that in general sending a bunch of block reports that can't be processed until much later is bad (for the reasons discussed above like GC problems, lack of RPC handler threads, memory consumption, etc.) But the patch should keep the FBRs flowing pretty regularly... we will still queue up 6 of them on the NN even though we can only process 1 at once. Does your design have a safety net - say a DN will wait a max of 2 periods to get permission (or something like that). This is kind of like a traffic light, right? If the traffic light is red for a long time, there must be a problem somewhere else in the system. But the solution can't be to slam on the accelerator when the red light lasts too long. You'll just crash, especially in a traffic jam. Maybe car analogies are taking it too far, but hopefully you can see what I'm saying. I think sending FBRs when the system is not ready for them is a really bad behavior. It leads to congestion collapse, which is much worse than starving a few DNs for a while. Hmm. What if we had a metric which was the average length of time the DN had to wait before sending a full block report that it wanted to send? Management systems could follow this metric and raise an alert when the time got too high. Then the admin can do something to solve the problem.
          Hide
          jnp Jitendra Nath Pandey added a comment -

          Colin P. McCabe, has this feature been tested at scale? Could you please give some details about testing done on this?
          Also, is there a way we can disable this feature?

          Show
          jnp Jitendra Nath Pandey added a comment - Colin P. McCabe , has this feature been tested at scale? Could you please give some details about testing done on this? Also, is there a way we can disable this feature?
          Hide
          cmccabe Colin P. McCabe added a comment -

          We have tested this on large (300 node) clusters. If you want to disable this, then you can simply set dfs.namenode.max.full.block.report.leases to a very large value. Then you can have the old behavior where there is no rate-limiting on the block reports that come into the NameNode. I'm not sure why you would want that behavior, though.

          Show
          cmccabe Colin P. McCabe added a comment - We have tested this on large (300 node) clusters. If you want to disable this, then you can simply set dfs.namenode.max.full.block.report.leases to a very large value. Then you can have the old behavior where there is no rate-limiting on the block reports that come into the NameNode. I'm not sure why you would want that behavior, though.

            People

            • Assignee:
              cmccabe Colin P. McCabe
              Reporter:
              cmccabe Colin P. McCabe
            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development