Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4114

Remove the BackupNode and CheckpointNode from trunk

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Incompatible change

      Description

      Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the BackupNode and CheckpointNode.

      1. HDFS-4114.000.patch
        216 kB
        Haohui Mai
      2. HDFS-4114.001.patch
        216 kB
        Haohui Mai
      3. HDFS-4114.patch
        230 kB
        Suresh Srinivas

        Issue Links

          Activity

          Hide
          Konstantin Shvachko added a comment -

          Yes I do use BackupNode.
          -1 on this idea.

          Show
          Konstantin Shvachko added a comment - Yes I do use BackupNode. -1 on this idea.
          Hide
          Eli Collins added a comment -

          Hey Konst,
          I'll re-purpose this jira to just remove the CheckpointNode.

          Show
          Eli Collins added a comment - Hey Konst, I'll re-purpose this jira to just remove the CheckpointNode.
          Hide
          Todd Lipcon added a comment -

          Konstantin: could you please elaborate on how you use the BackupNode?

          As discussed in the thread, it's difficult to see how it's usable in its current state, and there has been no work in Apache to move it forward.

          Here are the issues I see with the backupnode:

          • It doesn't provide a hot standby, since it doesn't get any block information. I've seen your prototype using a "load duplicator", but that software is not available in Apache, and I don't think it would correctly handle the majority of the corner cases we had to solve during HDFS-1623 development.
          • Even with the above addressed, there is no functionality to "promote" a backup node to active, so it doesn't provide HA at all.
          • Because it uses RPC to transfer edits, it ties the availability and response time of the Active to the availability and response time of the Backup. Up until recently (HDFS-3126) there was no RPC timeout configured at all on the backup stream, so if the backup lost its network connection or otherwise froze, the active would freeze for several minutes if not indefinitely. Thus it actually reduces availability in all currently released branches.

          After adding the timeout, there is now the possibility that the active and backup are not synchronized. Without external synchronization there is no way to know whether the two nodes are synchronized, and thus even if we had a way to promote the backup, there's be no safe way to do so automatically without risking rollback of the namespace. So the backup cannot be used for automatic failover in its current form without substantial design changes.

          • Even if you are using the BN in an older version or a private fork, it is clear that you aren't maintaining it in current releases. The backupnode tests were failing for many months earlier this year with no one stepping up to fix them. Other contributors have had to step in and maintain the code, eg with fixes like HDFS-2666, HDFS-2764, HDFS-3625.

          So, to summarize, please justify your -1 with an explanation of how you are using the BackupNode to provide some feature which is not already more mature and production-ready elsewhere in Hadoop 2.x.

          Show
          Todd Lipcon added a comment - Konstantin: could you please elaborate on how you use the BackupNode? As discussed in the thread, it's difficult to see how it's usable in its current state, and there has been no work in Apache to move it forward. Here are the issues I see with the backupnode: It doesn't provide a hot standby, since it doesn't get any block information. I've seen your prototype using a "load duplicator", but that software is not available in Apache, and I don't think it would correctly handle the majority of the corner cases we had to solve during HDFS-1623 development. Even with the above addressed, there is no functionality to "promote" a backup node to active, so it doesn't provide HA at all. Because it uses RPC to transfer edits, it ties the availability and response time of the Active to the availability and response time of the Backup. Up until recently ( HDFS-3126 ) there was no RPC timeout configured at all on the backup stream, so if the backup lost its network connection or otherwise froze, the active would freeze for several minutes if not indefinitely. Thus it actually reduces availability in all currently released branches. After adding the timeout, there is now the possibility that the active and backup are not synchronized. Without external synchronization there is no way to know whether the two nodes are synchronized, and thus even if we had a way to promote the backup, there's be no safe way to do so automatically without risking rollback of the namespace. So the backup cannot be used for automatic failover in its current form without substantial design changes. Even if you are using the BN in an older version or a private fork, it is clear that you aren't maintaining it in current releases. The backupnode tests were failing for many months earlier this year with no one stepping up to fix them. Other contributors have had to step in and maintain the code, eg with fixes like HDFS-2666 , HDFS-2764 , HDFS-3625 . So, to summarize, please justify your -1 with an explanation of how you are using the BackupNode to provide some feature which is not already more mature and production-ready elsewhere in Hadoop 2.x.
          Hide
          Todd Lipcon added a comment -

          Another example from today where the BackupNode test is failing and taking contributors' time: HDFS-4138

          Show
          Todd Lipcon added a comment - Another example from today where the BackupNode test is failing and taking contributors' time: HDFS-4138
          Hide
          Konstantin Shvachko added a comment -

          > I'll re-purpose this jira to just remove the CheckpointNode.

          I wonder what that means. CheckpointNode is just a role of the BackupNode in which it performs checkpoints like SNN and does not keep the in-memory state in sync with the primary NN.
          So changing the subject doesn't change the purpose.

          Eli:
          From formalistic perspective you cannot just remove something from core Hadoop. You first need to deprecate it and then may remove in the next major version. That is the rule I was following for the last 7 years. Let me know if it has changed recently. And that is why particularly SNN was not removed but deprecated, otherwise we would have had a more efficient checkpointing engine, see below.

          Todd:
          I see BackupNode as a better way of creating checkpoints. SNN uploads the image and the edits from NN, merges them in memory and then sends back the new checkpoint.
          BN needs only to saveNamespace() from memory and then sends back the new image. This reduces the network traffic and local disk IOs on the upload of two large files. I have seen on multiple large clusters NameNode running much slower, when the checkpoint is in progress.
          It is beneficial for HDFS performance to switch from SNN to BN for checkpointing. Therefore I would advocate re-re-deprecating SNN instead of removing BN.
          I accept your criticism that BackupNode code path was getting less attention from me personally and the community at large. Will have to work on that on my side.
          I would be glad to go into design discussion and potential enhancements of BackupNode with you. Would appreciate it given your experience with HA, as I believe the HA story for Hadoop isn't over with the implementation of Quorum Journal.
          Although this issue is not about it. Sticking to the point, what are your arguments for removing (or better say deprecating) BN besides that it has bugs? Software tends to have bugs. E.g. you do not propose to remove BlockScanner just because it couldn't been fixed over a series jiras.

          Show
          Konstantin Shvachko added a comment - > I'll re-purpose this jira to just remove the CheckpointNode. I wonder what that means. CheckpointNode is just a role of the BackupNode in which it performs checkpoints like SNN and does not keep the in-memory state in sync with the primary NN. So changing the subject doesn't change the purpose. Eli: From formalistic perspective you cannot just remove something from core Hadoop. You first need to deprecate it and then may remove in the next major version. That is the rule I was following for the last 7 years. Let me know if it has changed recently. And that is why particularly SNN was not removed but deprecated, otherwise we would have had a more efficient checkpointing engine, see below. Todd: I see BackupNode as a better way of creating checkpoints. SNN uploads the image and the edits from NN, merges them in memory and then sends back the new checkpoint. BN needs only to saveNamespace() from memory and then sends back the new image. This reduces the network traffic and local disk IOs on the upload of two large files. I have seen on multiple large clusters NameNode running much slower, when the checkpoint is in progress. It is beneficial for HDFS performance to switch from SNN to BN for checkpointing. Therefore I would advocate re-re-deprecating SNN instead of removing BN. I accept your criticism that BackupNode code path was getting less attention from me personally and the community at large. Will have to work on that on my side. I would be glad to go into design discussion and potential enhancements of BackupNode with you. Would appreciate it given your experience with HA, as I believe the HA story for Hadoop isn't over with the implementation of Quorum Journal. Although this issue is not about it. Sticking to the point, what are your arguments for removing (or better say deprecating) BN besides that it has bugs? Software tends to have bugs. E.g. you do not propose to remove BlockScanner just because it couldn't been fixed over a series jiras.
          Hide
          Todd Lipcon added a comment -

          I see BackupNode as a better way of creating checkpoints. SNN uploads the image and the edits from NN, merges them in memory and then sends back the new checkpoint.
          BN needs only to saveNamespace() from memory and then sends back the new image. This reduces the network traffic and local disk IOs on the upload of two large files. I have seen on multiple large clusters NameNode running much slower, when the checkpoint is in progress.
          It is beneficial for HDFS performance to switch from SNN to BN for checkpointing. Therefore I would advocate re-re-deprecating SNN instead of removing BN.

          This argument seems to be predicated on an idea that the SecondaryNameNode doesn't keep the image in memory between checkpoints, and that it downloads the image from the NN anew for each checkpoint. This hasn't been the case since HDFS-1458 in 0.23, which made a small improvement to the 2NN to solve the problem you're pointing out.

          I would be glad to go into design discussion and potential enhancements of BackupNode with you. Would appreciate it given your experience with HA, as I believe the HA story for Hadoop isn't over with the implementation of Quorum Journal.

          Feel free to ping me if you have any questions on the HA design or implementation - always happy to help.

          Although this issue is not about it. Sticking to the point, what are your arguments for removing (or better say deprecating) BN besides that it has bugs? Software tends to have bugs. E.g. you do not propose to remove BlockScanner just because it couldn't been fixed over a series jiras.

          The BackupNode doesn't provide any feature that is not provided better by other pieces of code. Your argument about efficiency isn't valid given HDFS-1458.

          The BlockScanner argument is a silly one: it has had some bugs, but there is no alternative available which doesn't have bugs, so a buggy piece of code is better than no piece of code. If someone had written a new BlockScanner which offered more features and fewer bugs, I'd absolutely advocate removing it.

          Show
          Todd Lipcon added a comment - I see BackupNode as a better way of creating checkpoints. SNN uploads the image and the edits from NN, merges them in memory and then sends back the new checkpoint. BN needs only to saveNamespace() from memory and then sends back the new image. This reduces the network traffic and local disk IOs on the upload of two large files. I have seen on multiple large clusters NameNode running much slower, when the checkpoint is in progress. It is beneficial for HDFS performance to switch from SNN to BN for checkpointing. Therefore I would advocate re-re-deprecating SNN instead of removing BN. This argument seems to be predicated on an idea that the SecondaryNameNode doesn't keep the image in memory between checkpoints, and that it downloads the image from the NN anew for each checkpoint. This hasn't been the case since HDFS-1458 in 0.23, which made a small improvement to the 2NN to solve the problem you're pointing out. I would be glad to go into design discussion and potential enhancements of BackupNode with you. Would appreciate it given your experience with HA, as I believe the HA story for Hadoop isn't over with the implementation of Quorum Journal. Feel free to ping me if you have any questions on the HA design or implementation - always happy to help. Although this issue is not about it. Sticking to the point, what are your arguments for removing (or better say deprecating) BN besides that it has bugs? Software tends to have bugs. E.g. you do not propose to remove BlockScanner just because it couldn't been fixed over a series jiras. The BackupNode doesn't provide any feature that is not provided better by other pieces of code. Your argument about efficiency isn't valid given HDFS-1458 . The BlockScanner argument is a silly one: it has had some bugs, but there is no alternative available which doesn't have bugs, so a buggy piece of code is better than no piece of code. If someone had written a new BlockScanner which offered more features and fewer bugs, I'd absolutely advocate removing it.
          Hide
          Kihwal Lee added a comment -

          If a piece of code is not actively maintained and is redundant to or superseded by a better maintained one, I think deprecating and eventually removing it is definitely an option. I believe this is what some people feel about BackupNode. If someone starts contributing changes that are beneficial and meaningful to the community, the situation can change. But it hasn't happened for quite some time.

          I accept your criticism that BackupNode code path was getting less attention from me personally and the community at large. Will have to work on that on my side.

          I hope you do and great features and improvements to come out in the future. Then we will be able to reset "its-getting-old-lets-get-rid-of-it" timer on BackupNode. How about start deprecating it in 2.0? If any new development based on BackupNode results in unique features before next major release, we will un-deprecate it.

          Show
          Kihwal Lee added a comment - If a piece of code is not actively maintained and is redundant to or superseded by a better maintained one, I think deprecating and eventually removing it is definitely an option. I believe this is what some people feel about BackupNode. If someone starts contributing changes that are beneficial and meaningful to the community, the situation can change. But it hasn't happened for quite some time. I accept your criticism that BackupNode code path was getting less attention from me personally and the community at large. Will have to work on that on my side. I hope you do and great features and improvements to come out in the future. Then we will be able to reset "its-getting-old-lets-get-rid-of-it" timer on BackupNode. How about start deprecating it in 2.0? If any new development based on BackupNode results in unique features before next major release, we will un-deprecate it.
          Hide
          Todd Lipcon added a comment -

          I hope you do and great features and improvements to come out in the future. Then we will be able to reset "its-getting-old-lets-get-rid-of-it" timer on BackupNode. How about start deprecating it in 2.0? If any new development based on BackupNode results in unique features before next major release, we will un-deprecate it.

          That sounds like a good plan by me. Does that seem like an agreeable compromise?

          Show
          Todd Lipcon added a comment - I hope you do and great features and improvements to come out in the future. Then we will be able to reset "its-getting-old-lets-get-rid-of-it" timer on BackupNode. How about start deprecating it in 2.0? If any new development based on BackupNode results in unique features before next major release, we will un-deprecate it. That sounds like a good plan by me. Does that seem like an agreeable compromise?
          Hide
          Eli Collins added a comment -

          Sounds good to me.

          Show
          Eli Collins added a comment - Sounds good to me.
          Hide
          Konstantin Shvachko added a comment -

          > Your argument about efficiency isn't valid given HDFS-1458.

          It is. SNN still needs to upload edits. BN does not need to upload anything.

          > The BlockScanner argument is a silly one:

          Todd, would you please try to keep this more civilized.
          It is not an argument, it is an example.

          > it has had some bugs

          That is my point. BN has some bugs as well, introduced while the code evolved. It is not a valid reason to remove it.

          > How about start deprecating it in 2.0?

          I disagree. It is just twisting the same idea. You are deprecating it in the past to remove it in the present.

          There is no alternative to BackupNode as an HA solution without shared storage, besides checkpointing.
          I'll look at the test cases listed here.
          I will re-purpose HDFS-2064 for 0.23 and submit a patch soon. Starting from Hadoop 0.23 there is no need for load replicator, as DNs can talk to multiple NameNodes internally.

          Show
          Konstantin Shvachko added a comment - > Your argument about efficiency isn't valid given HDFS-1458 . It is. SNN still needs to upload edits. BN does not need to upload anything. > The BlockScanner argument is a silly one: Todd, would you please try to keep this more civilized. It is not an argument, it is an example. > it has had some bugs That is my point. BN has some bugs as well, introduced while the code evolved. It is not a valid reason to remove it. > How about start deprecating it in 2.0? I disagree. It is just twisting the same idea. You are deprecating it in the past to remove it in the present. There is no alternative to BackupNode as an HA solution without shared storage, besides checkpointing. I'll look at the test cases listed here. I will re-purpose HDFS-2064 for 0.23 and submit a patch soon. Starting from Hadoop 0.23 there is no need for load replicator, as DNs can talk to multiple NameNodes internally.
          Hide
          Suresh Srinivas added a comment -

          In federation, we wanted to use CheckpointNode for sharing a single checkpointer for all the namenodes in a cluster, triggered based on crontab. I have not had time to explore this further.

          Looking at the discussion so far, I understand removing/deprecating if no one plans to use or work on BackupNode. Given that Konstantin has indicated interest in pursuing it, I would leave it as it is right now. Konstantin, it would be great if you can share your plans and timelines. Meanwhile, for next few months, if any issues crop up, I volunteer to spend my time fixing them.

          Show
          Suresh Srinivas added a comment - In federation, we wanted to use CheckpointNode for sharing a single checkpointer for all the namenodes in a cluster, triggered based on crontab. I have not had time to explore this further. Looking at the discussion so far, I understand removing/deprecating if no one plans to use or work on BackupNode. Given that Konstantin has indicated interest in pursuing it, I would leave it as it is right now. Konstantin, it would be great if you can share your plans and timelines. Meanwhile, for next few months, if any issues crop up, I volunteer to spend my time fixing them.
          Hide
          Eli Collins added a comment -

          Konstantin, are you actually interested in pursuing this? The BackupNode hasn't been touched in years.

          Show
          Eli Collins added a comment - Konstantin, are you actually interested in pursuing this? The BackupNode hasn't been touched in years.
          Hide
          Konstantin Boudnik added a comment -

          Also, from the pure precedent stand-point post-factum deprecation (e.g. after a release had happen) is a bad idea. I am in full agreement with Konstantin here that the past practice should stand and a deprecated feature has to be kept around for at least one major release.

          Show
          Konstantin Boudnik added a comment - Also, from the pure precedent stand-point post-factum deprecation (e.g. after a release had happen) is a bad idea. I am in full agreement with Konstantin here that the past practice should stand and a deprecated feature has to be kept around for at least one major release.
          Hide
          Todd Lipcon added a comment -

          In federation, we wanted to use CheckpointNode for sharing a single checkpointer for all the namenodes in a cluster, triggered based on crontab

          My question is why that doesn't apply to the SecondaryNameNode just as well? The 2NN and the CheckpointNode offer identical functionality, except that one has been battle-tested whereas the other hasn't been run in production AFAIK.

          Show
          Todd Lipcon added a comment - In federation, we wanted to use CheckpointNode for sharing a single checkpointer for all the namenodes in a cluster, triggered based on crontab My question is why that doesn't apply to the SecondaryNameNode just as well? The 2NN and the CheckpointNode offer identical functionality, except that one has been battle-tested whereas the other hasn't been run in production AFAIK.
          Hide
          Suresh Srinivas added a comment -

          My question is why that doesn't apply to the SecondaryNameNode just as well?

          At least at the time when I was considering it, Secondary Namenode was a long running daemon and Checkpointer was shaping up to be a utility.

          Show
          Suresh Srinivas added a comment - My question is why that doesn't apply to the SecondaryNameNode just as well? At least at the time when I was considering it, Secondary Namenode was a long running daemon and Checkpointer was shaping up to be a utility.
          Hide
          Konstantin Shvachko added a comment -

          > The BackupNode hasn't been touched in years.

          What do you mean? HDFS-3625 "touched" it just in July.
          I actually made BackupNode work for Hadoop-0.22. I don't wee why it is not possible for 0.23.

          > it would be great if you can share your plans and timelines.

          I will be making changes during next couple of weeks. Will get the general idea of the scoping in the process.

          Show
          Konstantin Shvachko added a comment - > The BackupNode hasn't been touched in years. What do you mean? HDFS-3625 "touched" it just in July. I actually made BackupNode work for Hadoop-0.22. I don't wee why it is not possible for 0.23. > it would be great if you can share your plans and timelines. I will be making changes during next couple of weeks. Will get the general idea of the scoping in the process.
          Hide
          Kihwal Lee added a comment -

          Konstantin, I welcome your renewed commitment to BackupNode. Currently 0.23 and 2.0/trunk are significantly different due to the HA work. You might want to base your effort on branch-2/trunk, otherwise it could soon become obsolete again.

          Show
          Kihwal Lee added a comment - Konstantin, I welcome your renewed commitment to BackupNode. Currently 0.23 and 2.0/trunk are significantly different due to the HA work. You might want to base your effort on branch-2/trunk, otherwise it could soon become obsolete again.
          Hide
          Eli Collins added a comment -

          What do you mean? HDFS-3625 "touched" it just in July.
          I actually made BackupNode work for Hadoop-0.22. I don't wee why it is not possible for 0.23.

          I mean that it hasn't seen development activity get committed aside from fixing the tests (eg HDFS-3625), I've not seen any users try it etc, none of the active developers maintain it, etc.

          I don't see how the BackupNode is useful in its current state (just a node that you can stream edits to). The goal for the BackupNode was to evolve from it's current state to become a StandbyNode, but that effort is, several years later, not made much progress. Eg HDFS-2064 has been patch available for a year on a branch that, IIUC, no one plans to release from. The only 0.22 based system I'm aware of has since moved off of 0.22.

          Show
          Eli Collins added a comment - What do you mean? HDFS-3625 "touched" it just in July. I actually made BackupNode work for Hadoop-0.22. I don't wee why it is not possible for 0.23. I mean that it hasn't seen development activity get committed aside from fixing the tests (eg HDFS-3625 ), I've not seen any users try it etc, none of the active developers maintain it, etc. I don't see how the BackupNode is useful in its current state (just a node that you can stream edits to). The goal for the BackupNode was to evolve from it's current state to become a StandbyNode, but that effort is, several years later, not made much progress. Eg HDFS-2064 has been patch available for a year on a branch that, IIUC, no one plans to release from. The only 0.22 based system I'm aware of has since moved off of 0.22.
          Hide
          Sanjay Radia added a comment -

          Konstantine can you volunteer to fix any failing tests related to backupNode.

          Show
          Sanjay Radia added a comment - Konstantine can you volunteer to fix any failing tests related to backupNode.
          Hide
          Konstantin Shvachko added a comment -

          Yes. Please forward them my way.

          Show
          Konstantin Shvachko added a comment - Yes. Please forward them my way.
          Hide
          Konstantin Shvachko added a comment -

          Does it make sense to refactor BN into a separate package?
          I think it does.

          Show
          Konstantin Shvachko added a comment - Does it make sense to refactor BN into a separate package? I think it does.
          Hide
          Suresh Srinivas added a comment -

          Konstantin Shvachko, its been a while since there has been a discussion about this jira. I do not know of many people using BackupNode. Is this a good time to start the discussion about deprecate and remove support for BackupNode? Do you still think we need to retain the support for BackupNode?

          The main reason for this is - if this functionality is not used by anyone, maintaining it adds unnecessary work. As an example, when I added support for retry cache there were bunch of code paths related to backupnode that added unnecessary work.

          Show
          Suresh Srinivas added a comment - Konstantin Shvachko , its been a while since there has been a discussion about this jira. I do not know of many people using BackupNode. Is this a good time to start the discussion about deprecate and remove support for BackupNode? Do you still think we need to retain the support for BackupNode? The main reason for this is - if this functionality is not used by anyone, maintaining it adds unnecessary work. As an example, when I added support for retry cache there were bunch of code paths related to backupnode that added unnecessary work.
          Hide
          Suresh Srinivas added a comment -

          I am assigning this jira to myself.

          Show
          Suresh Srinivas added a comment - I am assigning this jira to myself.
          Hide
          Suresh Srinivas added a comment -

          Early patch. Bunch of cleanup still left in code and protobuf proto files.

          Show
          Suresh Srinivas added a comment - Early patch. Bunch of cleanup still left in code and protobuf proto files.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12617121/HDFS-4114.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5643//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5643//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617121/HDFS-4114.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5643//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5643//console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment -

          Suresh Srinivas, I missed your comment of Nov 8 while travelling, sorry, still recovering.
          As it stands today BackupNode is the only extension of the NameNode in the current code base.
          It still provides important bindings to downstream projects that I believe we both care about: by its mere existence and test coverage.
          You are right it's been a while and I have a debt to provide proper ones, which is on my todo list.
          I understand the burden of supporting and in the mean time want to reiterate my readiness to promptly address any related issues.
          LMK if I missed or can help with any.

          Glanced through your patch.
          Saw some things that probably fall as collateral damage, like documentation about "Import Checkpoint", which is not related to BN.
          But, thanks, it nicely scopes for me the essence of the bindings required.
          If you wish we can assign this issue to me so that I could take care of it in the future.

          Show
          Konstantin Shvachko added a comment - Suresh Srinivas , I missed your comment of Nov 8 while travelling, sorry, still recovering. As it stands today BackupNode is the only extension of the NameNode in the current code base. It still provides important bindings to downstream projects that I believe we both care about: by its mere existence and test coverage. You are right it's been a while and I have a debt to provide proper ones, which is on my todo list. I understand the burden of supporting and in the mean time want to reiterate my readiness to promptly address any related issues. LMK if I missed or can help with any. Glanced through your patch. Saw some things that probably fall as collateral damage, like documentation about "Import Checkpoint", which is not related to BN. But, thanks, it nicely scopes for me the essence of the bindings required. If you wish we can assign this issue to me so that I could take care of it in the future.
          Hide
          Suresh Srinivas added a comment - - edited

          As it stands today BackupNode is the only extension of the NameNode in the current code base.

          I do not think it is a sufficient reason to retain BackupNode. If you really want to show how Namenode can be extended, you could contribute another simpler, easier to maintain example that extends Namenode. In fact some of the constructs that are used only by BackupNode, I reckon, are not what extensions of Namenode would use. Some examples:

          1. It uses JournalProtocol and NamenodeProtocol to co-ordinate checkpointing. This is no longer necessary with the improvements in edits, where checkpointing can be done any time without the need to roll, start checkpoint, end checkpoint.
          2. A lot of code in FSImage and FSEditLog caters to this, just for BackupNode. This code is not well documented. Adds unnecessary complexity.

          As you see from the early patch, we can remove approximately 5000 lines of code. This code belongs to a functionality that no one tests or uses. In fact I will not be surprised that there are bugs lurking in that functionality that might cause major issues for a misguided user that ends up using it.

          Given that I believe BackupNode should be removed. As regards to is any code that helps extending namenode is being removed, I would like to see a proposal on what extending a namenode means, which of the functionality relevant to that is being removed in my patch.

          You are right it's been a while and I have a debt to provide proper ones, which is on my todo list.

          I fail to understand what the plans for BackupNode are and why is it relevant anymore. Describing that would help.

          If you wish we can assign this issue to me so that I could take care of it in the future.

          I wish just assigning a bug to you would have been that easy. When making changes in the code, with a feature in mind, there are lot of these unused code and tests that also need change. This is currently a tax that feature developers are paying. The folks working on a feature have a time frame that they are working towards. Having to depend on you for related changes means, having to co-ordinate the work with you, getting the work done within the timeline. This will not only be work for you, but also work for people working on features. It is hard for me to reason why spend all that effort?

          I can give you few examples where folks had to do all this unnecessary work:

          • When we did protobuf support we had to add support for all the protocols that is only used by BackupNode.
          • In HA, considerable coding and testing effort went into supporting BackupNode.
          • Recently, when I worked on retry cache, I spent a lot of time just understanding how all this works and added support for retriabiliity.
          • I also know that Jing Zhao and Haohui Mai spent time on BackupNode specific functionality when working on http policy and https support related cleanup.

          Unless there are justified reasons for retaining this functionality, regular contributors of HDFS will have to continue pay this cost. We have waited almost an year for a plan for taking BackupNode forward. I also think with Namenode HA stabilizing, even if there is a plan, I am not sure how relevant it would be.

          A suggestion is to move this functionality to github and as HDFS changes you could maintain it. This in essence is equivalent to involving you to maintain BackupNode related functionality for features added to HDFS, without the cost of co-ordination.

          Show
          Suresh Srinivas added a comment - - edited As it stands today BackupNode is the only extension of the NameNode in the current code base. I do not think it is a sufficient reason to retain BackupNode. If you really want to show how Namenode can be extended, you could contribute another simpler, easier to maintain example that extends Namenode. In fact some of the constructs that are used only by BackupNode, I reckon, are not what extensions of Namenode would use. Some examples: It uses JournalProtocol and NamenodeProtocol to co-ordinate checkpointing. This is no longer necessary with the improvements in edits, where checkpointing can be done any time without the need to roll, start checkpoint, end checkpoint. A lot of code in FSImage and FSEditLog caters to this, just for BackupNode. This code is not well documented. Adds unnecessary complexity. As you see from the early patch, we can remove approximately 5000 lines of code. This code belongs to a functionality that no one tests or uses. In fact I will not be surprised that there are bugs lurking in that functionality that might cause major issues for a misguided user that ends up using it. Given that I believe BackupNode should be removed. As regards to is any code that helps extending namenode is being removed, I would like to see a proposal on what extending a namenode means, which of the functionality relevant to that is being removed in my patch. You are right it's been a while and I have a debt to provide proper ones, which is on my todo list. I fail to understand what the plans for BackupNode are and why is it relevant anymore. Describing that would help. If you wish we can assign this issue to me so that I could take care of it in the future. I wish just assigning a bug to you would have been that easy. When making changes in the code, with a feature in mind, there are lot of these unused code and tests that also need change. This is currently a tax that feature developers are paying. The folks working on a feature have a time frame that they are working towards. Having to depend on you for related changes means, having to co-ordinate the work with you, getting the work done within the timeline. This will not only be work for you, but also work for people working on features. It is hard for me to reason why spend all that effort? I can give you few examples where folks had to do all this unnecessary work: When we did protobuf support we had to add support for all the protocols that is only used by BackupNode. In HA, considerable coding and testing effort went into supporting BackupNode. Recently, when I worked on retry cache, I spent a lot of time just understanding how all this works and added support for retriabiliity. I also know that Jing Zhao and Haohui Mai spent time on BackupNode specific functionality when working on http policy and https support related cleanup. Unless there are justified reasons for retaining this functionality, regular contributors of HDFS will have to continue pay this cost. We have waited almost an year for a plan for taking BackupNode forward. I also think with Namenode HA stabilizing, even if there is a plan, I am not sure how relevant it would be. A suggestion is to move this functionality to github and as HDFS changes you could maintain it. This in essence is equivalent to involving you to maintain BackupNode related functionality for features added to HDFS, without the cost of co-ordination.
          Hide
          Haohui Mai added a comment -

          Rebased

          Show
          Haohui Mai added a comment - Rebased
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12639295/HDFS-4114.000.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6624//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6624//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639295/HDFS-4114.000.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6624//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6624//console This message is automatically generated.
          Hide
          Jing Zhao added a comment -

          The rebased patch looks good to me.
          Now we have made more changes to BackupNode related code to make it compile with changes caused by HA upgrade support, rolling upgrade support, and HDFS-3405. I guess no tests have been done to make sure BackupNode still works. Maybe it's now a good time for us to remove BackupNode from trunk and deprecate it in branch-2.

          Show
          Jing Zhao added a comment - The rebased patch looks good to me. Now we have made more changes to BackupNode related code to make it compile with changes caused by HA upgrade support, rolling upgrade support, and HDFS-3405 . I guess no tests have been done to make sure BackupNode still works. Maybe it's now a good time for us to remove BackupNode from trunk and deprecate it in branch-2.
          Hide
          Suresh Srinivas added a comment -

          Maybe it's now a good time for us to remove BackupNode from trunk and deprecate it in branch-2.

          Can we create a separate jira for branch-2 to deprecate BackupNode? Also we should change the hadoop start/stop scripts to print deprecation warning.

          Also the patch I had posted earlier had some remaining code. Not sure if Haohui Mai has removed all references to backup node completely. We can delete any lingering unused code in subsequent jiras as well.

          Show
          Suresh Srinivas added a comment - Maybe it's now a good time for us to remove BackupNode from trunk and deprecate it in branch-2. Can we create a separate jira for branch-2 to deprecate BackupNode? Also we should change the hadoop start/stop scripts to print deprecation warning. Also the patch I had posted earlier had some remaining code. Not sure if Haohui Mai has removed all references to backup node completely. We can delete any lingering unused code in subsequent jiras as well.
          Hide
          Jing Zhao added a comment -

          Can we create a separate jira for branch-2 to deprecate BackupNode?

          Created HDFS-6212 for branch-2.

          Show
          Jing Zhao added a comment - Can we create a separate jira for branch-2 to deprecate BackupNode? Created HDFS-6212 for branch-2.
          Hide
          Jing Zhao added a comment -

          Haohui Mai, there are still several TODOs in the current rebased patch. How do you plan to address them?

          Show
          Jing Zhao added a comment - Haohui Mai , there are still several TODOs in the current rebased patch. How do you plan to address them?
          Hide
          Haohui Mai added a comment -

          The v1 patch cleans up some of the TODOs. There are a couple TODOs that require changes in the protobuf files. I'll clean them up in a subsequent jira.

          Show
          Haohui Mai added a comment - The v1 patch cleans up some of the TODOs. There are a couple TODOs that require changes in the protobuf files. I'll clean them up in a subsequent jira.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12639451/HDFS-4114.001.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6633//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6633//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639451/HDFS-4114.001.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6633//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6633//console This message is automatically generated.
          Hide
          Jing Zhao added a comment - - edited

          There are a couple TODOs that require changes in the protobuf files. I'll clean them up in a subsequent jira.

          Sounds good to me. There are still a couple of methods that need to be cleaned, such as endCheckpoint. But we can do it in subsequent jiras. The 001 patch looks good to me as a first step. +1.

          Show
          Jing Zhao added a comment - - edited There are a couple TODOs that require changes in the protobuf files. I'll clean them up in a subsequent jira. Sounds good to me. There are still a couple of methods that need to be cleaned, such as endCheckpoint. But we can do it in subsequent jiras. The 001 patch looks good to me as a first step. +1.
          Hide
          Haohui Mai added a comment -

          Thanks Jing Zhao for the review. I'll commit it to trunk this weekend.

          Show
          Haohui Mai added a comment - Thanks Jing Zhao for the review. I'll commit it to trunk this weekend.
          Hide
          Konstantin Shvachko added a comment -

          > I'll commit it to trunk this weekend.

          Please don't. My -1 is still active.
          Unless you want to ignore it, which probably means anarchy in the project.

          Show
          Konstantin Shvachko added a comment - > I'll commit it to trunk this weekend. Please don't. My -1 is still active. Unless you want to ignore it, which probably means anarchy in the project.
          Hide
          Konstantin Boudnik added a comment -

          I don't understand the rush, guys. There's a legit use of the mechanism as Konstantin has stated earlier in the JIRA. It might not be widely used at the moment, but it provides certain benefits to some of us in the community.
          Perhaps a certain overhead of maintaining the code is present. Let's see what's the actual overhead of keeping this code around. BackupImage class had two tiny patches since February 2013.BackupJournalManager has been touched 5 times in 2+ years. BackupNode was modified 6 times in about the same time, and so on.
          Konstantin has repeatedly asked to send all the maintenance issues his way. Why isn't this a satisfactory approach?

          Show
          Konstantin Boudnik added a comment - I don't understand the rush, guys. There's a legit use of the mechanism as Konstantin has stated earlier in the JIRA. It might not be widely used at the moment, but it provides certain benefits to some of us in the community. Perhaps a certain overhead of maintaining the code is present. Let's see what's the actual overhead of keeping this code around. BackupImage class had two tiny patches since February 2013.BackupJournalManager has been touched 5 times in 2+ years. BackupNode was modified 6 times in about the same time, and so on. Konstantin has repeatedly asked to send all the maintenance issues his way. Why isn't this a satisfactory approach?
          Hide
          Suresh Srinivas added a comment -

          Konstantin Boudnik and Konstantin Shvachko, I commented more than 5 months ago, responding back with questions - https://issues.apache.org/jira/browse/HDFS-4114?focusedCommentId=13841600&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13841600. After a long silence, patches have been posted to take this jira further.

          I don't understand the rush, guys. There's a legit use of the mechanism as Konstantin has stated earlier in the JIRA. It might not be widely used at the moment, but it provides certain benefits to some of us in the community.

          BackupNode was introduced in 2008/2009. I do not see any one using it. Konstantin Boudnik, have you used it?

          Coming to benefits, can you explain what they are? Please review my comments. I do not see any benefits at this time, other than serving to confuse and create unnecessary work. What are the plans you have for BackupNode and please help us understand the reason for retaining it.

          BackupImage class had two tiny patches since February 2013.BackupJournalManager has been touched 5 times in 2+ years. BackupNode was modified 6 times in about the same time, and so on.

          BackupNode related code just does not exist in these classes only. It is in FSEditLog, FSImage etc. as well. There are also many unnecessary protocols. Please review the patch posted to see the extent of code associated. If you count all these places, there are many other changes that had to be made during HA, retry Cache, FSImage and editlog related work. That said, even a single unnecessary change made to parts that are no longer relevant is one too many.

          This issue has been brought up by several folks. Active committers to HDFS no longer see any reason to retain this functionality. Other active committers and contributors, if you disagree, please speak up.

          These are possible things I can see going forward:

          • Please post responses to my comments earlier. Please post what you plan to do with BackupNode. Please justify your veto based on technical reasons.
          • We can comment out the code or we can start creating jiras for BackupNode changes only to be picked up by Konstantin Shvachko. The problem I see with the second option is, having to wait for Konstantin Shvachko to make progress on any work that requires change to BackupNode.

          Going silent and not responding to comments in a timely manner and only responding when code changes are about to be made (with someone putting all the effort to make the changes) is not being nice to others in the community.

          Show
          Suresh Srinivas added a comment - Konstantin Boudnik and Konstantin Shvachko , I commented more than 5 months ago, responding back with questions - https://issues.apache.org/jira/browse/HDFS-4114?focusedCommentId=13841600&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13841600 . After a long silence, patches have been posted to take this jira further. I don't understand the rush, guys. There's a legit use of the mechanism as Konstantin has stated earlier in the JIRA. It might not be widely used at the moment, but it provides certain benefits to some of us in the community. BackupNode was introduced in 2008/2009. I do not see any one using it. Konstantin Boudnik , have you used it? Coming to benefits, can you explain what they are? Please review my comments. I do not see any benefits at this time, other than serving to confuse and create unnecessary work. What are the plans you have for BackupNode and please help us understand the reason for retaining it. BackupImage class had two tiny patches since February 2013.BackupJournalManager has been touched 5 times in 2+ years. BackupNode was modified 6 times in about the same time, and so on. BackupNode related code just does not exist in these classes only. It is in FSEditLog, FSImage etc. as well. There are also many unnecessary protocols. Please review the patch posted to see the extent of code associated. If you count all these places, there are many other changes that had to be made during HA, retry Cache, FSImage and editlog related work. That said, even a single unnecessary change made to parts that are no longer relevant is one too many. This issue has been brought up by several folks. Active committers to HDFS no longer see any reason to retain this functionality. Other active committers and contributors, if you disagree, please speak up. These are possible things I can see going forward: Please post responses to my comments earlier. Please post what you plan to do with BackupNode. Please justify your veto based on technical reasons. We can comment out the code or we can start creating jiras for BackupNode changes only to be picked up by Konstantin Shvachko . The problem I see with the second option is, having to wait for Konstantin Shvachko to make progress on any work that requires change to BackupNode. Going silent and not responding to comments in a timely manner and only responding when code changes are about to be made (with someone putting all the effort to make the changes) is not being nice to others in the community.
          Hide
          Konstantin Shvachko added a comment -

          Suresh Srinivas.

          1. I thought I answered all of your questions here in this jira and in the hdfs-dev@ thread that was happening around the same time.
            I thoroughly reviewed the comment you are citing. It does not contain questions, except for one rhetorical, but it explains your reasoning well. My reasoning is summarized in this comment, which by the way contains a review comment for your earlier patch, that remained unaddressed.
            I think the problem here is not that I am not answering your questions, but that my answers do not favour your reasoning, which is the nature of a disagreement in general.
          2. This jira was explicitly vetoed, and there was no consensus reached yet. I did not and could not know that somebody is putting an effort or doing work on it.
          3. I proposed my help in fixing BakcupNode issues many times. Suresh Srinivas please go ahead file jiras and assign them to me. I'll do my best to address those as promptly as I can.
          Show
          Konstantin Shvachko added a comment - Suresh Srinivas . I thought I answered all of your questions here in this jira and in the hdfs-dev@ thread that was happening around the same time. I thoroughly reviewed the comment you are citing. It does not contain questions, except for one rhetorical, but it explains your reasoning well. My reasoning is summarized in this comment , which by the way contains a review comment for your earlier patch, that remained unaddressed. I think the problem here is not that I am not answering your questions, but that my answers do not favour your reasoning, which is the nature of a disagreement in general. This jira was explicitly vetoed, and there was no consensus reached yet. I did not and could not know that somebody is putting an effort or doing work on it. I proposed my help in fixing BakcupNode issues many times. Suresh Srinivas please go ahead file jiras and assign them to me. I'll do my best to address those as promptly as I can.
          Hide
          Suresh Srinivas added a comment -

          Konstantin Shvachko, the comment I was talking about was after the one you pointed to me, here.

          That said, I am no longer going to spend much time and energy on this. I will create jiras if necessary and point you to them.

          Show
          Suresh Srinivas added a comment - Konstantin Shvachko , the comment I was talking about was after the one you pointed to me, here . That said, I am no longer going to spend much time and energy on this. I will create jiras if necessary and point you to them.
          Hide
          Konstantin Shvachko added a comment -

          Sure thing. Please assign them to me so that I could see them in my mail box sooner and in order to avoid any confusion.

          Show
          Konstantin Shvachko added a comment - Sure thing. Please assign them to me so that I could see them in my mail box sooner and in order to avoid any confusion.

            People

            • Assignee:
              Suresh Srinivas
              Reporter:
              Eli Collins
            • Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:

                Development