Details

    • Type: New Feature
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: all
    • Fix Version/s: unscheduled
    • Component/s: src
    • Labels:
      None

      Description

      Some people want the ability to permanently remove a file, dir, or revision from
      history forever.  This means rewriting the whole filesystem;  a big deal.  It
      would also break existing working copies.
      
      But hey, some folks want security.  :-)
      

        Issue Links

          Activity

          Hide
          sussman Ben Collins-Sussman added a comment -

          1 week estimated, post-1.0
          

          Show
          sussman Ben Collins-Sussman added a comment - 1 week estimated, post-1.0
          Hide
          cmpilato C. Michael Pilato added a comment -

          some irc chat leads to the propostion that perhaps this functionality 
          belongs in svnadmin?
          

          Show
          cmpilato C. Michael Pilato added a comment - some irc chat leads to the propostion that perhaps this functionality belongs in svnadmin?
          Hide
          subversion-importer Subversion Importer added a comment -

          Well, hopefully, you'd add it to libsvn_ra if you could. 
          Additionally, it's not just a security issue, it's also a repository 
          maintance issue. Sometime it's handy to be able to obliterate the 
          revision data for some revisions of binary files after you've just 
          got done "tag"ing, or "label"ing the code. Quite a space saver.
          

          Original comment by rassilon

          Show
          subversion-importer Subversion Importer added a comment - Well, hopefully, you'd add it to libsvn_ra if you could. Additionally, it's not just a security issue, it's also a repository maintance issue. Sometime it's handy to be able to obliterate the revision data for some revisions of binary files after you've just got done "tag"ing, or "label"ing the code. Quite a space saver. Original comment by rassilon
          Hide
          sussman Ben Collins-Sussman added a comment -

          Yes, 'obliteration' of repository data *is* a repository maintenance 
          issue, which is why it has to be part of 'svnadmin', not the RA 
          layer.  Here's why.
          
          Simply put, permanent removal of a revision or node causes a vast 
          chain reaction in the filesystem.  Nodes are written as deltas 
          against one another;  the whole fs would need to 'rewrite' itself, 
          and this would probably take a non-trivial amount of time.  This 
          means bringing the repository off-line while this work is done... and 
          there's a high probability that all existing working copies will be 
          invalidated, since they'll be referring to invalid node ids.
          
          Thus this isn't something a client can casually do via RA.
          
          
          
          

          Show
          sussman Ben Collins-Sussman added a comment - Yes, 'obliteration' of repository data *is* a repository maintenance issue, which is why it has to be part of 'svnadmin', not the RA layer. Here's why. Simply put, permanent removal of a revision or node causes a vast chain reaction in the filesystem. Nodes are written as deltas against one another; the whole fs would need to 'rewrite' itself, and this would probably take a non-trivial amount of time. This means bringing the repository off-line while this work is done... and there's a high probability that all existing working copies will be invalidated, since they'll be referring to invalid node ids. Thus this isn't something a client can casually do via RA.
          Hide
          brane Branko Čibej added a comment -

          > Nodes are written as deltas 
          > against one another;  the whole fs would need to 'rewrite' itself, 
          
          That's not quite true. You only have to re-delta the nodes that
          are deltas against the obliterated node. That's hardly the whole
          filesystem -- usually it'll only be said node's immediate predecessor(s).
          
          
          

          Show
          brane Branko Čibej added a comment - > Nodes are written as deltas > against one another; the whole fs would need to 'rewrite' itself, That's not quite true. You only have to re-delta the nodes that are deltas against the obliterated node. That's hardly the whole filesystem -- usually it'll only be said node's immediate predecessor(s).
          Hide
          brane Branko Čibej added a comment -

          Just to clarify: instead of removing the whole node, remove just the
          contents and make the node "dead" -- maybe a new type of node. This
          means that dirs rferring to the node don't have to be changed, they just
          don't show the node at all on checkout.
          
          Update of working copies that refer to that node would work exactly as
          if the node had been deleted.
          
          

          Show
          brane Branko Čibej added a comment - Just to clarify: instead of removing the whole node, remove just the contents and make the node "dead" -- maybe a new type of node. This means that dirs rferring to the node don't have to be changed, they just don't show the node at all on checkout. Update of working copies that refer to that node would work exactly as if the node had been deleted.
          Hide
          subversion-importer Subversion Importer added a comment -

          Yes it's expensive, yes it's obnoxious, yes it's necessary. However, 
          saying you want to put it into svnadmin is just an excuse because SVN 
          hasn't yet implemented ACLs and the Sentinel mechanism. SVN is a 
          client/server app. Why shouldn't I be able to remotely administer it?
          

          Original comment by rassilon

          Show
          subversion-importer Subversion Importer added a comment - Yes it's expensive, yes it's obnoxious, yes it's necessary. However, saying you want to put it into svnadmin is just an excuse because SVN hasn't yet implemented ACLs and the Sentinel mechanism. SVN is a client/server app. Why shouldn't I be able to remotely administer it? Original comment by rassilon
          Hide
          gstein Greg Stein added a comment -

          It isn't an excuse for anything -- Ben was making his comment based on
          the length of time involved and that it would be safest to perform
          while the system was offline.
          
          Yes, the ACL stuff needs to be completed, but that isn't (necessarily)
          the reason to avoid RA.
          
          Personally, I think we can and should do it through RA. The issues
          that Ben mentions can all be handled.
          

          Show
          gstein Greg Stein added a comment - It isn't an excuse for anything -- Ben was making his comment based on the length of time involved and that it would be safest to perform while the system was offline. Yes, the ACL stuff needs to be completed, but that isn't (necessarily) the reason to avoid RA. Personally, I think we can and should do it through RA. The issues that Ben mentions can all be handled.
          Hide
          subversion-importer Subversion Importer added a comment -

          Another approach to consider for a clean way to remove permanently any
          arbitrary code - revert the db files, perhaps from backup restored
          from logfiles to the point right before the "bad code" was entered,
          and then replaying the commits against the database - perhaps with a
          null commit to take the commit number of the "bad code" commit.  This
          is the only way, it seems to me, to guarantee that the transactional
          guarantees of commits remain, that the removed code doesn't cause
          other consistancy problems.  If the svnadmin tool could provide a nice
          automated way to be able to both revert the db files (if sufficient
          backups are around on the server) and replay patches in some
          automated, easy-to-process way, that would be awesome.  But that might
          elevate this to a well-past-1.0 enhancement request.  :)
          
          Brian
          
          

          Original comment by brian

          Show
          subversion-importer Subversion Importer added a comment - Another approach to consider for a clean way to remove permanently any arbitrary code - revert the db files, perhaps from backup restored from logfiles to the point right before the "bad code" was entered, and then replaying the commits against the database - perhaps with a null commit to take the commit number of the "bad code" commit. This is the only way, it seems to me, to guarantee that the transactional guarantees of commits remain, that the removed code doesn't cause other consistancy problems. If the svnadmin tool could provide a nice automated way to be able to both revert the db files (if sufficient backups are around on the server) and replay patches in some automated, easy-to-process way, that would be awesome. But that might elevate this to a well-past-1.0 enhancement request. :) Brian Original comment by brian
          Hide
          gstein Greg Stein added a comment -

          Replaying commits into an old database would invalidate existing 
          working copies (the version resource URLs would be off), just like 
          most solutions to "svn obliterate".
          
          Note that we already have a way to prevent the old URLs from being 
          accidentally used and causing problems: there is a component in those 
          URLs that the server admin can change. When a database change occurs 
          which might alter the node IDs, then that component can be changed 
          using the SVNSpecialURI directive. That automatically invalidates any 
          cached version resource URI "out in the field".
          
          

          Show
          gstein Greg Stein added a comment - Replaying commits into an old database would invalidate existing working copies (the version resource URLs would be off), just like most solutions to "svn obliterate". Note that we already have a way to prevent the old URLs from being accidentally used and causing problems: there is a component in those URLs that the server admin can change. When a database change occurs which might alter the node IDs, then that component can be changed using the SVNSpecialURI directive. That automatically invalidates any cached version resource URI "out in the field".
          Hide
          striker Sander Striker added a comment -

          *** Issue 1260 has been marked as a duplicate of this issue. ***
          

          Show
          striker Sander Striker added a comment - *** Issue 1260 has been marked as a duplicate of this issue. ***
          Hide
          kfogel Karl Fogel added a comment -

          *** Issue 1848 has been marked as a duplicate of this issue. ***
          

          Show
          kfogel Karl Fogel added a comment - *** Issue 1848 has been marked as a duplicate of this issue. ***
          Hide
          subversion-importer Subversion Importer added a comment -

          What if you redefined the requirements to make it more practical
          to implement?  Specifically, building off  Branko Cibej's comment,
          how about just marking a file as obliterated?
          
          That mark would prevent the server from ever giving out
          any information about that file.  The file would also
          not be included in any 'svnadmin dump' output.  However,
          the data would sit benignly in the repo until someone does
          a dump/load sequence at a much later date.  The documentation
          would have to be clear that the server administrator could
          still read sensitive data, even if no users can.
          
          Since it would not really obliterate the data, maybe the name
          should be changed to 'svn redact' or something.  A true
          obliterate could still be considered in a far future release.
          
          
          

          Original comment by jrobbins

          Show
          subversion-importer Subversion Importer added a comment - What if you redefined the requirements to make it more practical to implement? Specifically, building off Branko Cibej's comment, how about just marking a file as obliterated? That mark would prevent the server from ever giving out any information about that file. The file would also not be included in any 'svnadmin dump' output. However, the data would sit benignly in the repo until someone does a dump/load sequence at a much later date. The documentation would have to be clear that the server administrator could still read sensitive data, even if no users can. Since it would not really obliterate the data, maybe the name should be changed to 'svn redact' or something. A true obliterate could still be considered in a far future release. Original comment by jrobbins
          Hide
          subversion-importer Subversion Importer added a comment -

          A relevent example of the need for "obliterate" recently happened on tigris.org.
          
          Some users decided to check into CVS a total of 225 MBs for the tool binaries
          used in their project: IDEs, webserver, database server, etc.  That's bad news
          for CVS.  SVN would handle it much better, but even so, it is just a matter of
          time before someone accidently imports their entire C: drive or unix / directory.
          
          For a small team repository, the admin can manually deal with it.  But, in a
          larger SVN installation, it would be better to somehow obliterate the unwisely
          added files.   In this use case, I think that it would be acceptable to (a) just
          mark the commits as being oblitered or "redacted", and (b) drop the data for
          those files and replace it with a short file (e.g., "this version of this file
          has been redacted").  I don't think that you need to erase all history of the
          commit or invalidate anyone's working copy.
          

          Original comment by jrobbins

          Show
          subversion-importer Subversion Importer added a comment - A relevent example of the need for "obliterate" recently happened on tigris.org. Some users decided to check into CVS a total of 225 MBs for the tool binaries used in their project: IDEs, webserver, database server, etc. That's bad news for CVS. SVN would handle it much better, but even so, it is just a matter of time before someone accidently imports their entire C: drive or unix / directory. For a small team repository, the admin can manually deal with it. But, in a larger SVN installation, it would be better to somehow obliterate the unwisely added files. In this use case, I think that it would be acceptable to (a) just mark the commits as being oblitered or "redacted", and (b) drop the data for those files and replace it with a short file (e.g., "this version of this file has been redacted"). I don't think that you need to erase all history of the commit or invalidate anyone's working copy. Original comment by jrobbins
          Hide
          subversion-importer Subversion Importer added a comment -

          Couldn't this be implemented in such a way that, at the revision where
          obliteration occurs, the data is erased and sort of pushed up one revision,
          changing that revision's data to be the base data and not just a delta?
          
          This feature would be important for long-running projects.  I'd hope that it'd
          stay within the 'svn' app and not be required to be in the 'svnadmin' app.  For
          projects hosted on other servers where svnadmin isn't accessible, it's
          none-too-fun to deal with cleanup.  If I were able to access svnadmin, I coulda
          just done a dump/load sequence with a dumpfilter.
          
          Perhaps Subversion needs to have an additional user-access level beyond read and
          read/write for minor administration.
          

          Original comment by metx

          Show
          subversion-importer Subversion Importer added a comment - Couldn't this be implemented in such a way that, at the revision where obliteration occurs, the data is erased and sort of pushed up one revision, changing that revision's data to be the base data and not just a delta? This feature would be important for long-running projects. I'd hope that it'd stay within the 'svn' app and not be required to be in the 'svnadmin' app. For projects hosted on other servers where svnadmin isn't accessible, it's none-too-fun to deal with cleanup. If I were able to access svnadmin, I coulda just done a dump/load sequence with a dumpfilter. Perhaps Subversion needs to have an additional user-access level beyond read and read/write for minor administration. Original comment by metx
          Hide
          kfogel Karl Fogel added a comment -

          Note that this feature could mean any subset of a) zap a single node-revision,
          b) zap an entire node's lineage, c) zap all instances of a given path in all
          revisions, d) zap entire revisions.
          
          Is it about removing the content (the "confidentiality" solution), or about
          getting rid of any trace of the path or content ever having existed (the
          "lawyer" solution)?  It may be that some situations call for one behavior, and
          others call for another.
          
          In other words, we have some specification work to do before we can implement this.
          

          Show
          kfogel Karl Fogel added a comment - Note that this feature could mean any subset of a) zap a single node-revision, b) zap an entire node's lineage, c) zap all instances of a given path in all revisions, d) zap entire revisions. Is it about removing the content (the "confidentiality" solution), or about getting rid of any trace of the path or content ever having existed (the "lawyer" solution)? It may be that some situations call for one behavior, and others call for another. In other words, we have some specification work to do before we can implement this.
          Hide
          subversion-importer Subversion Importer added a comment -

          This is now the fifth most requested feature (by votes) though the issue itself
          seems to be idle for a couple of months now. May I humbly request a target
          milestone for this issue or at least an updated estimate of how much work this
          would require? The second comment estimated the work at 1 week; I assume this is
          no longer the case?
          

          Original comment by cowwoc

          Show
          subversion-importer Subversion Importer added a comment - This is now the fifth most requested feature (by votes) though the issue itself seems to be idle for a couple of months now. May I humbly request a target milestone for this issue or at least an updated estimate of how much work this would require? The second comment estimated the work at 1 week; I assume this is no longer the case? Original comment by cowwoc
          Hide
          kfogel Karl Fogel added a comment -

          One week isn't even enough to spec out this feature from a design/features
          standpoint.  As my comment above yours indicates, it's not settled exactly what
          this feature would do.  A lot of specification work is needed; and since there's
          a workaround (using 'svnadmin dump' and 'svnadmin load'), no developer has felt
          that work to be high enough priority to start the conversation.  I know that's
          disappointing, but in a sense, people have been "voting" with their code -- that
          is, voting to do other things :-).
          
          The vote counts on these issues are not to be trusted, by the way.  The
          developers don't pay much attention to them, preferring to watch the users@
          mailing list, and therefore many people don't bother to vote on issues important
          to them.  If you see N votes on an issue, all you can conclude is that N people
          voted for it.  Its priority relative to other issues is a much more complex
          question, one which the vote doesn't count help answer.
          
          Hope this helps,
          -Karl
          

          Show
          kfogel Karl Fogel added a comment - One week isn't even enough to spec out this feature from a design/features standpoint. As my comment above yours indicates, it's not settled exactly what this feature would do. A lot of specification work is needed; and since there's a workaround (using 'svnadmin dump' and 'svnadmin load'), no developer has felt that work to be high enough priority to start the conversation. I know that's disappointing, but in a sense, people have been "voting" with their code -- that is, voting to do other things :-). The vote counts on these issues are not to be trusted, by the way. The developers don't pay much attention to them, preferring to watch the users@ mailing list, and therefore many people don't bother to vote on issues important to them. If you see N votes on an issue, all you can conclude is that N people voted for it. Its priority relative to other issues is a much more complex question, one which the vote doesn't count help answer. Hope this helps, -Karl
          Hide
          subversion-importer Subversion Importer added a comment -

          The problem with 'svnadmin dump' and 'svnadmin load' and that you cannot specify
          wildcards of files/directories to be obliterated. If you add such an ability I'd
          be more than happy to downplay to use that instead of the obliterate command.
          

          Original comment by cowwoc

          Show
          subversion-importer Subversion Importer added a comment - The problem with 'svnadmin dump' and 'svnadmin load' and that you cannot specify wildcards of files/directories to be obliterated. If you add such an ability I'd be more than happy to downplay to use that instead of the obliterate command. Original comment by cowwoc
          Hide
          blair Blair Zajac added a comment -

          Take a look at the svndumpfilter command.  It'll include or exclude
          portions of your dump according to the paths you specify on the command
          line.
          
          

          Show
          blair Blair Zajac added a comment - Take a look at the svndumpfilter command. It'll include or exclude portions of your dump according to the paths you specify on the command line.
          Hide
          subversion-importer Subversion Importer added a comment -

          Absolutely, but last time I checked it didn't support wildcards. Has that changed?
          

          Original comment by cowwoc

          Show
          subversion-importer Subversion Importer added a comment - Absolutely, but last time I checked it didn't support wildcards. Has that changed? Original comment by cowwoc
          Hide
          cmpilato C. Michael Pilato added a comment -

          No, that hasn't changed.  But the ability to accept wildcards, or better yet,
          regular expressions, would be a welcome addition to svndumpfilter (and, I
          imagine, notably easy to pull off).
          

          Show
          cmpilato C. Michael Pilato added a comment - No, that hasn't changed. But the ability to accept wildcards, or better yet, regular expressions, would be a welcome addition to svndumpfilter (and, I imagine, notably easy to pull off).
          Hide
          subversion-importer Subversion Importer added a comment -

          As part of my overall product management, I have to maintain and distribute bios
          and fpga loads from other build process along with the C/C++ code that my teams
          are developing.  I've setup a subversion directory hierarchy so these "other"
          loadbuilder can checkin their resultant binary loads into my repository.  I can
          now tag and manage the projext as a whole. As these binary files become stale, I
          would like to delete them from the repository all together and not just from the
          revision tree. For me, the 'svn obliterate' functionality is a needed
          requirement to help administer the size of the repository.  Reading through
          these comments, I believe deleting the contents of the file while maintaining
          the file name with the  repository revision tree could be a quick and simple
          means of implementing this feature.  I would probably just add a property
          indicating the file was obsoleted. Anyhow this feature request has my vote.
          

          Original comment by brentrwebster

          Show
          subversion-importer Subversion Importer added a comment - As part of my overall product management, I have to maintain and distribute bios and fpga loads from other build process along with the C/C++ code that my teams are developing. I've setup a subversion directory hierarchy so these "other" loadbuilder can checkin their resultant binary loads into my repository. I can now tag and manage the projext as a whole. As these binary files become stale, I would like to delete them from the repository all together and not just from the revision tree. For me, the 'svn obliterate' functionality is a needed requirement to help administer the size of the repository. Reading through these comments, I believe deleting the contents of the file while maintaining the file name with the repository revision tree could be a quick and simple means of implementing this feature. I would probably just add a property indicating the file was obsoleted. Anyhow this feature request has my vote. Original comment by brentrwebster
          Hide
          subversion-importer Subversion Importer added a comment -

          We've just encountered a need for this issue. Someone using Subversion for the
          first time just committed 340MB of garbage to the repository -- the *.exe files
          for their IDE, the Java 1.5 JDK, the J2EE 1.4 SDK, etc. None of this was
          sensitive information, but it does mean that our *tiny* little repository has
          suddenly BALLOONED in size, for no good reason.
          
          Considering that this feature was originally requested in 2001, I'm surprised
          that 5 years later it still hasn't been implemented. I think the reason is that
          it isn't an issue that is encountered often by those who are experienced with
          SVN (i.e. all of the developers on this project), but by those who are brand new
          to SVN (e.g. our group).
          
          I understand the desire not to permanently remove stuff willy nilly, but
          absolutely no purpose is served by having 340MB of garbage sitting there "just
          in case" we need it back. If we need to get back the Java 1.5 JDK, we know where
          to download it from. :-)
          

          Original comment by wallyhartshorn

          Show
          subversion-importer Subversion Importer added a comment - We've just encountered a need for this issue. Someone using Subversion for the first time just committed 340MB of garbage to the repository -- the *.exe files for their IDE, the Java 1.5 JDK, the J2EE 1.4 SDK, etc. None of this was sensitive information, but it does mean that our *tiny* little repository has suddenly BALLOONED in size, for no good reason. Considering that this feature was originally requested in 2001, I'm surprised that 5 years later it still hasn't been implemented. I think the reason is that it isn't an issue that is encountered often by those who are experienced with SVN (i.e. all of the developers on this project), but by those who are brand new to SVN (e.g. our group). I understand the desire not to permanently remove stuff willy nilly, but absolutely no purpose is served by having 340MB of garbage sitting there "just in case" we need it back. If we need to get back the Java 1.5 JDK, we know where to download it from. :-) Original comment by wallyhartshorn
          Hide
          subversion-importer Subversion Importer added a comment -

          Wally's comment rings true.  We need the ability to maintain the size of our
          database without the crude dump->filter->load procedure.  I know diskspace is
          cheap, relatively speaking but "things" happen that must be cleaned up.
          

          Original comment by brentrwebster

          Show
          subversion-importer Subversion Importer added a comment - Wally's comment rings true. We need the ability to maintain the size of our database without the crude dump->filter->load procedure. I know diskspace is cheap, relatively speaking but "things" happen that must be cleaned up. Original comment by brentrwebster
          Hide
          rooneg Garrett Rooney added a comment -

          Nobody is arguing that this feature is not useful, it's pretty much universally
          accepted that it would be a good thing to have.  The reason it hasn't been
          implemented is that it's hard to implement.  If you need it that badly the
          source is available, feel free to send in a patch.
          

          Show
          rooneg Garrett Rooney added a comment - Nobody is arguing that this feature is not useful, it's pretty much universally accepted that it would be a good thing to have. The reason it hasn't been implemented is that it's hard to implement. If you need it that badly the source is available, feel free to send in a patch.
          Hide
          blair Blair Zajac added a comment -

          Maybe the interested parties would be interested in putting up a
          bounty for thsi work to get done (not that I'm volunteering to
          take the bounty :) ).
          

          Show
          blair Blair Zajac added a comment - Maybe the interested parties would be interested in putting up a bounty for thsi work to get done (not that I'm volunteering to take the bounty :) ).
          Hide
          subversion-importer Subversion Importer added a comment -

          If adding an "svn obliterate" command is too difficult, perhaps an alternative
          would be to "svnobliterate" utility that would automate the workaround -- dump
          the repository, blow it way, then load it using the filter to omit the unwanted
          stuff.
          
          But it sounds like this won't be done any time soon, so I'm off to read about
          filtering....
          

          Original comment by wallyhartshorn

          Show
          subversion-importer Subversion Importer added a comment - If adding an "svn obliterate" command is too difficult, perhaps an alternative would be to "svnobliterate" utility that would automate the workaround -- dump the repository, blow it way, then load it using the filter to omit the unwanted stuff. But it sounds like this won't be done any time soon, so I'm off to read about filtering.... Original comment by wallyhartshorn
          Hide
          subversion-importer Subversion Importer added a comment -

          we will give a feedback the next time collabnet wants to sell something over here :)
          
          

          Original comment by thurnerrupert

          Show
          subversion-importer Subversion Importer added a comment - we will give a feedback the next time collabnet wants to sell something over here :) Original comment by thurnerrupert
          Hide
          ehuelsmann Erik Huelsmann added a comment -

          I'm very sure collabnet has no place in the Subversion issue tracker. You can
          put your feedback on the dev@ mailing list for all developers to read, whether
          they work for Collab or not (like me). That will give you a bigger chance of
          actually having this change than restricting that knowledge to Collab.
          
          Thanks in advance.
          
          

          Show
          ehuelsmann Erik Huelsmann added a comment - I'm very sure collabnet has no place in the Subversion issue tracker. You can put your feedback on the dev@ mailing list for all developers to read, whether they work for Collab or not (like me). That will give you a bigger chance of actually having this change than restricting that knowledge to Collab. Thanks in advance.
          Hide
          subversion-importer Subversion Importer added a comment -

          340MB of garbage are surely a lot, but on a repository I have (sigh!) to
          administrate, people work with single files of 600MB each! Disk space might be
          cheap nowadays, but you have to understand that I can't cope with more than a
          few commits of each of those files. What I would like to be able to do is to
          completely remove old versions of those files on a routinely basis. If I had an
          obliterate command, writing a script to do so automatically would be quite easy.
          By the way, svnadmin dump/load is NOT a viable workaround for me: it takes more
          than 10 hours just to dump the entire database...
          

          Original comment by iaanus

          Show
          subversion-importer Subversion Importer added a comment - 340MB of garbage are surely a lot, but on a repository I have (sigh!) to administrate, people work with single files of 600MB each! Disk space might be cheap nowadays, but you have to understand that I can't cope with more than a few commits of each of those files. What I would like to be able to do is to completely remove old versions of those files on a routinely basis. If I had an obliterate command, writing a script to do so automatically would be quite easy. By the way, svnadmin dump/load is NOT a viable workaround for me: it takes more than 10 hours just to dump the entire database... Original comment by iaanus
          Hide
          ziesemer Mark A. Ziesemer added a comment -

          Hence the reason I'm stuck with CVS for now...
          

          Show
          ziesemer Mark A. Ziesemer added a comment - Hence the reason I'm stuck with CVS for now...
          Hide
          subversion-importer Subversion Importer added a comment -

          I know money rarely motivates developers, but if everyone who needs this feature
          contributed something financially, would that help at all?
          
          To me personally this is the one and only major feature that Subversion is
          crying out for; it's unfortunate that it's quite a biggie both from "without it,
          bad things happen" and "it seems to be a big deal to add" points of view.
          

          Original comment by timj

          Show
          subversion-importer Subversion Importer added a comment - I know money rarely motivates developers, but if everyone who needs this feature contributed something financially, would that help at all? To me personally this is the one and only major feature that Subversion is crying out for; it's unfortunate that it's quite a biggie both from "without it, bad things happen" and "it seems to be a big deal to add" points of view. Original comment by timj
          Hide
          subversion-importer Subversion Importer added a comment -

          Ben Collins-Sussman wrote:
          
            "Simply put, permanent removal of a revision or node causes a vast 
             chain reaction in the filesystem.  Nodes are written as deltas 
             against one another;  the whole fs would need to 'rewrite' itself, 
             and this would probably take a non-trivial amount of time.  This 
             means bringing the repository off-line while this work is done... and 
             there's a high probability that all existing working copies will be 
             invalidated, since they'll be referring to invalid node ids."
          
          EXACTLY.  Where is the problem?  Granted, this isn't something you want to do 
          daily, but if it needs to be done, you will have the ability to do so.
          
          This method doesn't destroy the integrity of the data, and doesn't 
          leave "dead" nodes.  So you lose the "full" history (only in cases where 
          middle revisions were removed) but that's what you WANT in that case, right?
          
          This change would make me ecstatic. :)
          

          Original comment by billnolan

          Show
          subversion-importer Subversion Importer added a comment - Ben Collins-Sussman wrote: "Simply put, permanent removal of a revision or node causes a vast chain reaction in the filesystem. Nodes are written as deltas against one another; the whole fs would need to 'rewrite' itself, and this would probably take a non-trivial amount of time. This means bringing the repository off-line while this work is done... and there's a high probability that all existing working copies will be invalidated, since they'll be referring to invalid node ids." EXACTLY. Where is the problem? Granted, this isn't something you want to do daily, but if it needs to be done, you will have the ability to do so. This method doesn't destroy the integrity of the data, and doesn't leave "dead" nodes. So you lose the "full" history (only in cases where middle revisions were removed) but that's what you WANT in that case, right? This change would make me ecstatic. :) Original comment by billnolan
          Hide
          sussman Ben Collins-Sussman added a comment -

          Bill:  there is no problem.  Not in theory, at least.  It's just a very
          complicated chunk of programming to pull off, and nobody's bothered to try and
          tackle the challenge yet.  Nobody questions the need for this to happen.
          

          Show
          sussman Ben Collins-Sussman added a comment - Bill: there is no problem. Not in theory, at least. It's just a very complicated chunk of programming to pull off, and nobody's bothered to try and tackle the challenge yet. Nobody questions the need for this to happen.
          Hide
          subversion-importer Subversion Importer added a comment -

          I don't know if I'd call it "complicated".  It shouldn't be; but that depends 
          on the underlying architeture.  :)  As it proliferates through the foundation 
          of the system itself, it's vital that it is implemented in a robust and 
          comprehensive manner.  It may even involve some core changes to the basic 
          architecture to properly and gracefully support this kind of functionality.
          
          Anyhow, does anyone know if this task is planned?
          Also, how would I get involved in SVN development?  (And no, that doesn't mean 
          I'm volunteering! :)  )
          

          Original comment by billnolan

          Show
          subversion-importer Subversion Importer added a comment - I don't know if I'd call it "complicated". It shouldn't be; but that depends on the underlying architeture. :) As it proliferates through the foundation of the system itself, it's vital that it is implemented in a robust and comprehensive manner. It may even involve some core changes to the basic architecture to properly and gracefully support this kind of functionality. Anyhow, does anyone know if this task is planned? Also, how would I get involved in SVN development? (And no, that doesn't mean I'm volunteering! :) ) Original comment by billnolan
          Hide
          lgo Lieven Govaerts added a comment -

          Can you please take this discussion to the mailing list? The issue tracker is
          not a discussion forum. You can still add a comment here with a link to the
          discussion thread if it contains content valuable for the solution of this issue.
          
          thanks.
          

          Show
          lgo Lieven Govaerts added a comment - Can you please take this discussion to the mailing list? The issue tracker is not a discussion forum. You can still add a comment here with a link to the discussion thread if it contains content valuable for the solution of this issue. thanks.
          Hide
          subversion-importer Subversion Importer added a comment -

          How are these comments irrelevant?  You lost me there.
          

          Original comment by billnolan

          Show
          subversion-importer Subversion Importer added a comment - How are these comments irrelevant? You lost me there. Original comment by billnolan
          Hide
          subversion-importer Subversion Importer added a comment -

          FYI all - as a Perforce administrator, the ability to obliterate is not just a
          convenience, it's a requirement.  As repositories grow over time (ours grows at
          a logarithmicly increasing rate, currently about 50GB / year), at some point, it
          makes sense to archive certain parts of the repository.  That's not possible
          unless there's a way to dump certain parts of the repository, then obliterate it
          from the production space.  If the old information is needed, reference back to
          the archive is made.
          
          The problem we have is that as repositories grow, performance is negatively
          impacted.  If there's no way to remove obsolete information from a production
          repository, it becomes bloated at best and useless at worst.  I understand that
          there is a request to take this discussion to the lists, however, I feel it's
          important to report that the lack of an obliterate function is making it
          difficult for my company to justify switching to Subversion.
          
          It's been said (above) that there's no question that an Obliterate function is
          useful.  The question is, when is someone going to move things from being "nice
          to have" to making obliterate a requirement for Subversion?  I would hope that
          it's implemented prior to the next release (1.5 or 2.0).
          
          Use cases:
          1) Removal of confidential information
          2) Removal of obsolete information
          3) Removal of tags, branches, or trees that have moved to other repositories.
          
          

          Original comment by kbenton

          Show
          subversion-importer Subversion Importer added a comment - FYI all - as a Perforce administrator, the ability to obliterate is not just a convenience, it's a requirement. As repositories grow over time (ours grows at a logarithmicly increasing rate, currently about 50GB / year), at some point, it makes sense to archive certain parts of the repository. That's not possible unless there's a way to dump certain parts of the repository, then obliterate it from the production space. If the old information is needed, reference back to the archive is made. The problem we have is that as repositories grow, performance is negatively impacted. If there's no way to remove obsolete information from a production repository, it becomes bloated at best and useless at worst. I understand that there is a request to take this discussion to the lists, however, I feel it's important to report that the lack of an obliterate function is making it difficult for my company to justify switching to Subversion. It's been said (above) that there's no question that an Obliterate function is useful. The question is, when is someone going to move things from being "nice to have" to making obliterate a requirement for Subversion? I would hope that it's implemented prior to the next release (1.5 or 2.0). Use cases: 1) Removal of confidential information 2) Removal of obsolete information 3) Removal of tags, branches, or trees that have moved to other repositories. Original comment by kbenton
          Hide
          subversion-importer Subversion Importer added a comment -

          While I understand that this is a complicated feature to add, I would also love
          to see it scheduled. This 5-year old entry is now the 7th oldest of the 414
          unresolved issues in the SVN tracker.  When my company switched to Subversion
          from CVS, we were worried about this missing feature, but we believed the
          statement in the SVN book that it would be added.  CVS doesn't have the explicit
          command, but you just have to remove the corresponding ",v" file from the
          repository.
          
          Some people suggest a dump/svndumpfilter/load cycle as an alternative, but that
          is very problematic.  It requires repository downtime and takes a long time to
          run on large repositories.  Worse, there are many cases (particularly involving
          moves between excluded and included sections of the repository) that
          svndumpfilter can't handle.  And sometimes svndumpfilter produces a corrupt dump
          file which svnadmin refuses to load (for example, see bug 1853 which hasn't been
          resolved in 2.5 years).  SVN devs have posted to the mailing list that
          svndumpfilter is an untenable design and needs to be rewritten from scratch.
          
          This feature is important for both privacy and resource usage reasons.  SVN is
          still my favorite revision control system, but I hope you will catch up to all
          of the other RCSs which already have this feature.
          

          Original comment by jeffc

          Show
          subversion-importer Subversion Importer added a comment - While I understand that this is a complicated feature to add, I would also love to see it scheduled. This 5-year old entry is now the 7th oldest of the 414 unresolved issues in the SVN tracker. When my company switched to Subversion from CVS, we were worried about this missing feature, but we believed the statement in the SVN book that it would be added. CVS doesn't have the explicit command, but you just have to remove the corresponding ",v" file from the repository. Some people suggest a dump/svndumpfilter/load cycle as an alternative, but that is very problematic. It requires repository downtime and takes a long time to run on large repositories. Worse, there are many cases (particularly involving moves between excluded and included sections of the repository) that svndumpfilter can't handle. And sometimes svndumpfilter produces a corrupt dump file which svnadmin refuses to load (for example, see bug 1853 which hasn't been resolved in 2.5 years). SVN devs have posted to the mailing list that svndumpfilter is an untenable design and needs to be rewritten from scratch. This feature is important for both privacy and resource usage reasons. SVN is still my favorite revision control system, but I hope you will catch up to all of the other RCSs which already have this feature. Original comment by jeffc
          Hide
          subversion-importer Subversion Importer added a comment -

          If I could sponsor an experienced SVN developer to develop this feature, what
          would be the cost and timeline?
          

          Original comment by brentrwebster

          Show
          subversion-importer Subversion Importer added a comment - If I could sponsor an experienced SVN developer to develop this feature, what would be the cost and timeline? Original comment by brentrwebster
          Hide
          subversion-importer Subversion Importer added a comment -

          exactly for such things subversion is meanwhile outdated. a switch to mercurial
          or bazaar-ng version control is appropriate.
          
          slim python code and features are added within weeks :)
          
          
          

          Original comment by soloturn

          Show
          subversion-importer Subversion Importer added a comment - exactly for such things subversion is meanwhile outdated. a switch to mercurial or bazaar-ng version control is appropriate. slim python code and features are added within weeks :) Original comment by soloturn
          Hide
          ehuelsmann Erik Huelsmann added a comment -

          Could you please not start a good/bad VC discussion in the Subversion issue
          tracker? Comments related to this issue are welcome ofcourse.
          

          Show
          ehuelsmann Erik Huelsmann added a comment - Could you please not start a good/bad VC discussion in the Subversion issue tracker? Comments related to this issue are welcome ofcourse.
          Hide
          subversion-importer Subversion Importer added a comment -

          Not knowing precisely how subversion stores revision paths (in either BDB or 
          FSFS), I do not fully grasp the complexity of the situation.  However, this is 
          a terribly useful feature, and here's my $0.02:
          
          1) obliterate should allow one to remove a particular path (file or directory) 
          from the repository with the option to remove back to:
            a) the prior revision
            b) revision N
            c) the dawn of time
          
          2) leaving a log entry for the obliteration should be at the obliterator's 
          discretion; without a log entry, all trace of the path can (should? or yet 
          another option?) be wiped
          
          How does obliteration invalidate existing copies?  Shouldn't they just diff 
          against whatever is left?  Is there cached information that I'm missing here?
          
          If someone is willing to point out the difficulties in terms of the underlying 
          data structures, I will gladly put some thought into getting around them.  I 
          could slog through the source, but something tells me that would take a bit 
          longer...
          
          Thomas S. Trias
          Senior Developer
          Artizan Internet Service
          
          

          Original comment by tomtrias

          Show
          subversion-importer Subversion Importer added a comment - Not knowing precisely how subversion stores revision paths (in either BDB or FSFS), I do not fully grasp the complexity of the situation. However, this is a terribly useful feature, and here's my $0.02: 1) obliterate should allow one to remove a particular path (file or directory) from the repository with the option to remove back to: a) the prior revision b) revision N c) the dawn of time 2) leaving a log entry for the obliteration should be at the obliterator's discretion; without a log entry, all trace of the path can (should? or yet another option?) be wiped How does obliteration invalidate existing copies? Shouldn't they just diff against whatever is left? Is there cached information that I'm missing here? If someone is willing to point out the difficulties in terms of the underlying data structures, I will gladly put some thought into getting around them. I could slog through the source, but something tells me that would take a bit longer... Thomas S. Trias Senior Developer Artizan Internet Service Original comment by tomtrias
          Hide
          subversion-importer Subversion Importer added a comment -

          RE: Comments from Brent Webster Thu Dec 21 18:53:21 
          " If I could sponsor an experienced SVN developer to develop this feature, what
          would be the cost and timeline? "
          
          .. sounds like Brent means business... and he is probably not the only one.
          would it be a reasonable assumption that between us all  who could have some
          budget, we could pool together a very decent salary for a programmer who was
          committed to make sure this gets resolved within the next stable release of
          subversion.
          
          I am fairly confident we could contribute some budget for this...
          
          
          
          
          
          

          Original comment by manolson

          Show
          subversion-importer Subversion Importer added a comment - RE: Comments from Brent Webster Thu Dec 21 18:53:21 " If I could sponsor an experienced SVN developer to develop this feature, what would be the cost and timeline? " .. sounds like Brent means business... and he is probably not the only one. would it be a reasonable assumption that between us all who could have some budget, we could pool together a very decent salary for a programmer who was committed to make sure this gets resolved within the next stable release of subversion. I am fairly confident we could contribute some budget for this... Original comment by manolson
          Hide
          subversion-importer Subversion Importer added a comment -

          looks like theres some form of work in progress at
          http://svn.collab.net/repos/svn/trunk/contrib/server-side/svn-obliterate.py
          
          Unfortunately, there's some rather onerous warnings about "will destroy your
          repository" there... so I don't think this is an option yet.
          
          Can we narrow the scope a bit here, based on previous comments and a few
          observations of my own:
          1) The primary need for such a function is the removal of confidential or
          legally dubious materials. Other secondary applications may be practical, but
          shouldn't be the primary target, since arguably only the removal of
          confidential/legally dubious material is time critical.
          2) Obliteration should be part of svnadmin - it's not practical to allow
          clientside as it needs direct repository access, and is a security concern.
          3) It is better for the target scenario for all revisions of a file to be
          temporarily unavailable than to unnecessarily delay removal of improper
          materials in the repository. 
          4) The obliterate command should affect a target file and revision, and continue
          recursively to the HEAD. (straightforward removal)
          5) The obliterate command should optionally work in a "dump and obliterate"
          mode, in which the material affected by the obliterate operation is preserved in
          dump format. (permits the later reconstruction and reimport of intervening
          "good" revisions.)
          6) If it would ease the implementation, obliteration may fail with an error if
          there are intervening copy or moves from the target file between the revision to
          be deleted and HEAD. (if implemented in this manner, the administrator would
          have to manually walk through following the error messages to find and
          obliterate these first)
          7) A dump/load cycle would remain the preferred method for non-emergency removal
          of revisions/files and/or nontrivial removals.
          
          Thoughts?
          
          

          Original comment by sdaugherty

          Show
          subversion-importer Subversion Importer added a comment - looks like theres some form of work in progress at http://svn.collab.net/repos/svn/trunk/contrib/server-side/svn-obliterate.py Unfortunately, there's some rather onerous warnings about "will destroy your repository" there... so I don't think this is an option yet. Can we narrow the scope a bit here, based on previous comments and a few observations of my own: 1) The primary need for such a function is the removal of confidential or legally dubious materials. Other secondary applications may be practical, but shouldn't be the primary target, since arguably only the removal of confidential/legally dubious material is time critical. 2) Obliteration should be part of svnadmin - it's not practical to allow clientside as it needs direct repository access, and is a security concern. 3) It is better for the target scenario for all revisions of a file to be temporarily unavailable than to unnecessarily delay removal of improper materials in the repository. 4) The obliterate command should affect a target file and revision, and continue recursively to the HEAD. (straightforward removal) 5) The obliterate command should optionally work in a "dump and obliterate" mode, in which the material affected by the obliterate operation is preserved in dump format. (permits the later reconstruction and reimport of intervening "good" revisions.) 6) If it would ease the implementation, obliteration may fail with an error if there are intervening copy or moves from the target file between the revision to be deleted and HEAD. (if implemented in this manner, the administrator would have to manually walk through following the error messages to find and obliterate these first) 7) A dump/load cycle would remain the preferred method for non-emergency removal of revisions/files and/or nontrivial removals. Thoughts? Original comment by sdaugherty
          Hide
          subversion-importer Subversion Importer added a comment -

          a quite good approach imo, except one point:
          
          obliterate should be done by a user. an experienced user as well can walk
          through error messages. it might be a right granted by the admin to avoid vandalism.
          
          reason: big organisations/repositories try to be as admin-less as possible. onen
          of the decision points for or against a product the admin-cost-factor.
          
          
          
          
          

          Original comment by thurnerrupert

          Show
          subversion-importer Subversion Importer added a comment - a quite good approach imo, except one point: obliterate should be done by a user. an experienced user as well can walk through error messages. it might be a right granted by the admin to avoid vandalism. reason: big organisations/repositories try to be as admin-less as possible. onen of the decision points for or against a product the admin-cost-factor. Original comment by thurnerrupert
          Hide
          subversion-importer Subversion Importer added a comment -

          >1) The primary need for such a function is the removal of confidential or
          > legally dubious materials. Other secondary applications may be practical, but
          > shouldn't be the primary target, since arguably only the removal of
          > confidential/legally dubious material is time critical.
          
          Although that need is certainly the most time-critical, my actual need is just
          to save disk space, by removing from the server obsolete versions that won't be
          needed anymore, without disrupting all working copies. By reading the comments
          on this page and in several other places, I'm not alone. I won't label such use
          case as "secondary".
          
          >4) The obliterate command should affect a target file and revision,
          > and continue recursively to the HEAD. (straightforward removal)
          
          I don't agree with this requirement. In fact, in my use case it should work in
          exactly the opposite way, that is from a specified revision N backwards in time,
          thus leaving the revisions from N+1 to HEAD intact. By the way, if I understand
          correctly how deltas work, that is even easier to implement, because you don't
          even need to recompute deltas, you just "forget" some of them.
          
          

          Original comment by iaanus

          Show
          subversion-importer Subversion Importer added a comment - >1) The primary need for such a function is the removal of confidential or > legally dubious materials. Other secondary applications may be practical, but > shouldn't be the primary target, since arguably only the removal of > confidential/legally dubious material is time critical. Although that need is certainly the most time-critical, my actual need is just to save disk space, by removing from the server obsolete versions that won't be needed anymore, without disrupting all working copies. By reading the comments on this page and in several other places, I'm not alone. I won't label such use case as "secondary". >4) The obliterate command should affect a target file and revision, > and continue recursively to the HEAD. (straightforward removal) I don't agree with this requirement. In fact, in my use case it should work in exactly the opposite way, that is from a specified revision N backwards in time, thus leaving the revisions from N+1 to HEAD intact. By the way, if I understand correctly how deltas work, that is even easier to implement, because you don't even need to recompute deltas, you just "forget" some of them. Original comment by iaanus
          Hide
          subversion-importer Subversion Importer added a comment -

          Another usage case: We made the mistake of storing all our internal
          documentation (which was mostly in .doc format) in the Documentation directory
          in our internal repository, and revisions to this directory have bloated up the
          repository incredibly. We now don't have room in our backup server for the
          repository. Deleting the Document directory in our repository won't make the
          repository any smaller. We need to completely obliterate all the bloat the
          history adds for this directory. Then we'll delete the directory from the
          repository, and no one will make the mistake again.
          

          Original comment by netdragon

          Show
          subversion-importer Subversion Importer added a comment - Another usage case: We made the mistake of storing all our internal documentation (which was mostly in .doc format) in the Documentation directory in our internal repository, and revisions to this directory have bloated up the repository incredibly. We now don't have room in our backup server for the repository. Deleting the Document directory in our repository won't make the repository any smaller. We need to completely obliterate all the bloat the history adds for this directory. Then we'll delete the directory from the repository, and no one will make the mistake again. Original comment by netdragon
          Hide
          subversion-importer Subversion Importer added a comment -

          The only concern I have with the way it's been proposed so far, unless I'm
          interpreting things wrong, is that the revision numbers will be gone. It may be
          useful to clear out intermediary data, but keep the messages there because
          sometimes they tie in with Trac tickets and other tools, and also it may be
          useful to see what happened in between, even if the diffs are no longer present.
          Changing the actual revision numbers may also break other tools that rely on SVN
          a lot more than just clearing out the diffs for a revision. Also, I'm not sure,
          but would it prevent someone from having to re-checkout the repository doing it
          this way?
          
          I propose leaving the revision numbers, and messages, and just removing the
          diffs. Since I'm just an SVN user, I don't know if this is something that makes
          sense from the standpoint of someone who understands the nuts and bolts of SVN,
          or if it's something that's been mentioned here in a different way, but
          hopefully it's something you can put into consideration, thanks :-)
          

          Original comment by netdragon

          Show
          subversion-importer Subversion Importer added a comment - The only concern I have with the way it's been proposed so far, unless I'm interpreting things wrong, is that the revision numbers will be gone. It may be useful to clear out intermediary data, but keep the messages there because sometimes they tie in with Trac tickets and other tools, and also it may be useful to see what happened in between, even if the diffs are no longer present. Changing the actual revision numbers may also break other tools that rely on SVN a lot more than just clearing out the diffs for a revision. Also, I'm not sure, but would it prevent someone from having to re-checkout the repository doing it this way? I propose leaving the revision numbers, and messages, and just removing the diffs. Since I'm just an SVN user, I don't know if this is something that makes sense from the standpoint of someone who understands the nuts and bolts of SVN, or if it's something that's been mentioned here in a different way, but hopefully it's something you can put into consideration, thanks :-) Original comment by netdragon
          Hide
          subversion-importer Subversion Importer added a comment -

          @Brian: if I understand it correctly, the task you describe can already and
          easily be approached with a dump/filter/load cycle, as svndumpfilter is already
          able to completely remove files and their *whole* history on a path basis. You
          just need enough disk space to hold both the original and the "cleaned-up"
          repositories during the cycle, but, after that, the original repo can be
          deleted. The true use cases for obliterate are, IMHO, those described by
          Stephanie and myself, where you need to remove just some revisions of a file,
          leaving the file in the repository and *without* disrupting the working copies.
          

          Original comment by iaanus

          Show
          subversion-importer Subversion Importer added a comment - @Brian: if I understand it correctly, the task you describe can already and easily be approached with a dump/filter/load cycle, as svndumpfilter is already able to completely remove files and their *whole* history on a path basis. You just need enough disk space to hold both the original and the "cleaned-up" repositories during the cycle, but, after that, the original repo can be deleted. The true use cases for obliterate are, IMHO, those described by Stephanie and myself, where you need to remove just some revisions of a file, leaving the file in the repository and *without* disrupting the working copies. Original comment by iaanus
          Hide
          subversion-importer Subversion Importer added a comment -

          Here's a quite common scenario in many companies:
          - most devs are complete clueless
          - admins are assholes or lazy or incompetent
          - somebody thinks it's a good idea to check in some huge binary files into the
          source repo, or he does so accidently
          - you can't get a hold of the SVN server admin, or he's not cooperating at all
          - even if the admin is nice and helpful, you have to tell everybody to stop
            checking in and checking out until this is resolved. Typically, it takes 1-2
          hours for the first round, then you try to work with it and notice you missed
          something and have to redo it, so it takes a whole day. During which your dev
          team is in disarray.
          
          In practice it means you either accept the crap in your repo staying there
          forever or you waste half a day to one day of half of your dev and admin team on
          this.
          
          This happens quite regularly (about once per year for me) and is kind of the
          nightmare for somebody who cares about the repo.
          
          Case today:
          - Somebody accidently checked in the debug build of mozilla - the object files.
          2 GB, 36000 files, one checkin.
          - admins were very nice and helpful, once I told pointed them to the handbook
          section about svndumpfilter.
          - took about half a day to fix
          - I didn't see that the release build was checked in, too, and the filter didn't
          catch that.
          - admins already left for the weekend, and nobody knows the password of the SVN
          server.
          
          So, anything that requires shell access to the SVN server does not suit the
          problem. Neither does anything that takes a long time.
          
          What we need is a maybe special kind of SVN accout/right, one that can be and is
          granted to lead developers, not just admins. That should be able to get rid of
          revisions entirely.
          
          Implementation:
          I think the idea in comment from Jason Robbins Sun Mar 13 18:29:03 -0700 2005 is
          good.
          Another way that would work for me would be to completely delete all revisions
          starting from the faulty one - all later revisions could be deleted, too. It
          should be easy to implement. It means that I have to halt all development (or
          rather checkins) and have to merge a few revisions manually, and everybody may
          have to check out fram scratch, so that's not ideal, but better as status quo.
          

          Original comment by benb

          Show
          subversion-importer Subversion Importer added a comment - Here's a quite common scenario in many companies: - most devs are complete clueless - admins are assholes or lazy or incompetent - somebody thinks it's a good idea to check in some huge binary files into the source repo, or he does so accidently - you can't get a hold of the SVN server admin, or he's not cooperating at all - even if the admin is nice and helpful, you have to tell everybody to stop checking in and checking out until this is resolved. Typically, it takes 1-2 hours for the first round, then you try to work with it and notice you missed something and have to redo it, so it takes a whole day. During which your dev team is in disarray. In practice it means you either accept the crap in your repo staying there forever or you waste half a day to one day of half of your dev and admin team on this. This happens quite regularly (about once per year for me) and is kind of the nightmare for somebody who cares about the repo. Case today: - Somebody accidently checked in the debug build of mozilla - the object files. 2 GB, 36000 files, one checkin. - admins were very nice and helpful, once I told pointed them to the handbook section about svndumpfilter. - took about half a day to fix - I didn't see that the release build was checked in, too, and the filter didn't catch that. - admins already left for the weekend, and nobody knows the password of the SVN server. So, anything that requires shell access to the SVN server does not suit the problem. Neither does anything that takes a long time. What we need is a maybe special kind of SVN accout/right, one that can be and is granted to lead developers, not just admins. That should be able to get rid of revisions entirely. Implementation: I think the idea in comment from Jason Robbins Sun Mar 13 18:29:03 -0700 2005 is good. Another way that would work for me would be to completely delete all revisions starting from the faulty one - all later revisions could be deleted, too. It should be easy to implement. It means that I have to halt all development (or rather checkins) and have to merge a few revisions manually, and everybody may have to check out fram scratch, so that's not ideal, but better as status quo. Original comment by benb
          Hide
          subversion-importer Subversion Importer added a comment -

          Would an automated form of the suggestion made by Alberto Barbati be adequate?
          
          EG. oldrepo > dump > filter > newrepo
          
          Could we provide an svnadmin command that leverages the dump and filter
          capabilities to produce a filtered clone that is then 'swapped in' once it
          catches up to the HEAD.
          
          I expect that:
          - This would need to block users only briefly
          - provide an easy means of retaining the pre-filtered content as a backup (just
          don't delete the old repo)
          - leverage existing code which would mean a quicker resolution and perhaps take
          less time to test.
          
          The 'swap in' could be little more than taking the oldrepo offline briefly,
          renaming/moving the appropriate files and then coming back online.
          
          Inputs would be a starting revision and a list of path/patterns to filter.
          
          depending on the repository type it might be more cheaply possible to use
          filesystem operations to copy the content prior to the starting revision and
          only use the dump>filter>load cycle on a subset of revisions.
          
          This would still affect working copies.  I'm not sure that any implementation of
          this operation would not unless path information were retained somewhere to
          indicate removal - defeating the purpose to some degree.
          
          What effects this might have on details such as merge-tracking I don't know -
          but I expect that merges from empty revisions wouldn't make much sense.
          
          

          Original comment by talden

          Show
          subversion-importer Subversion Importer added a comment - Would an automated form of the suggestion made by Alberto Barbati be adequate? EG. oldrepo > dump > filter > newrepo Could we provide an svnadmin command that leverages the dump and filter capabilities to produce a filtered clone that is then 'swapped in' once it catches up to the HEAD. I expect that: - This would need to block users only briefly - provide an easy means of retaining the pre-filtered content as a backup (just don't delete the old repo) - leverage existing code which would mean a quicker resolution and perhaps take less time to test. The 'swap in' could be little more than taking the oldrepo offline briefly, renaming/moving the appropriate files and then coming back online. Inputs would be a starting revision and a list of path/patterns to filter. depending on the repository type it might be more cheaply possible to use filesystem operations to copy the content prior to the starting revision and only use the dump>filter>load cycle on a subset of revisions. This would still affect working copies. I'm not sure that any implementation of this operation would not unless path information were retained somewhere to indicate removal - defeating the purpose to some degree. What effects this might have on details such as merge-tracking I don't know - but I expect that merges from empty revisions wouldn't make much sense. Original comment by talden
          Hide
          subversion-importer Subversion Importer added a comment -

          So the consensus seems to be that this is a feature that everyone wants but it
          is difficult to implement. Brent has asked how much it would cost to sponsor an
          experienced SVN developer to develop this feature (in terms of time and money)
          and I'm curious as well.
          
          If you put up a Paypal link that enables end-users to fund work on this feature
          I am fairly certain you will raise the necessary funds in no time. You can then
          hire someone to work on this feature full-time. Can we please have a formal
          reply to this proposal?
          
          A final point is that the priority of this issue is set to P4. Regardless of the
          difficulty of implementing this feature I would argue it should have a priority
          of P3 or even P2. And issue type should probably be changed to "Feature".
          

          Original comment by cowwoc

          Show
          subversion-importer Subversion Importer added a comment - So the consensus seems to be that this is a feature that everyone wants but it is difficult to implement. Brent has asked how much it would cost to sponsor an experienced SVN developer to develop this feature (in terms of time and money) and I'm curious as well. If you put up a Paypal link that enables end-users to fund work on this feature I am fairly certain you will raise the necessary funds in no time. You can then hire someone to work on this feature full-time. Can we please have a formal reply to this proposal? A final point is that the priority of this issue is set to P4. Regardless of the difficulty of implementing this feature I would argue it should have a priority of P3 or even P2. And issue type should probably be changed to "Feature". Original comment by cowwoc
          Hide
          kfogel Karl Fogel added a comment -

          Agree with you about priority level and feature-ness (although note that the
          priority level is sort of meaningless: given that no one has gotten around to
          this in N years, it's not like upping the priority number is going to suddenly
          change something).  I've tweaked those two fields.
          
          As far as a "formal reply" to the bounty suggestion: I think the bounty idea is
          a good one, the trouble is that there aren't actually that many uncommitted
          developers available.  Most or maybe all have day jobs, and even when those day
          jobs are Subversion-related, that doesn't mean the developers are free to
          prioritize 'obliterate' over what their employer needs.
          
          I'm not sure just putting up a PayPal link somewhere would be enough to pay for
          this, but an organized effort to arrange a funding consortium might do the
          trick.  I don't have time for that myself right now, but maybe someone else does
          (or will at some point).  In any case, it can't hurt to have a note here that
          some people are willing to pay for this.
          

          Show
          kfogel Karl Fogel added a comment - Agree with you about priority level and feature-ness (although note that the priority level is sort of meaningless: given that no one has gotten around to this in N years, it's not like upping the priority number is going to suddenly change something). I've tweaked those two fields. As far as a "formal reply" to the bounty suggestion: I think the bounty idea is a good one, the trouble is that there aren't actually that many uncommitted developers available. Most or maybe all have day jobs, and even when those day jobs are Subversion-related, that doesn't mean the developers are free to prioritize 'obliterate' over what their employer needs. I'm not sure just putting up a PayPal link somewhere would be enough to pay for this, but an organized effort to arrange a funding consortium might do the trick. I don't have time for that myself right now, but maybe someone else does (or will at some point). In any case, it can't hurt to have a note here that some people are willing to pay for this.
          Hide
          subversion-importer Subversion Importer added a comment -

          Sounds like we are getting serious but I need three points covered before I
          raise a purchase req with my VP:
            - a design proposal
            - cost
            - references especially on previous svn based submissions
          

          Original comment by brentrwebster

          Show
          subversion-importer Subversion Importer added a comment - Sounds like we are getting serious but I need three points covered before I raise a purchase req with my VP: - a design proposal - cost - references especially on previous svn based submissions Original comment by brentrwebster
          Hide
          kfogel Karl Fogel added a comment -

          We're nowhere near raising a purchase req, sorry.
          
          What you need is a developer who's interested & available to do this.  With
          that, all things are possible.  But we don't have that right now.  The stages
          you list above are precisely what such a developer would start on.
          

          Show
          kfogel Karl Fogel added a comment - We're nowhere near raising a purchase req, sorry. What you need is a developer who's interested & available to do this. With that, all things are possible. But we don't have that right now. The stages you list above are precisely what such a developer would start on.
          Hide
          subversion-importer Subversion Importer added a comment -

          As a senior software designer and a junior software architect with some knowldge
          of svn, I would like to contribute with my analysis of the request. It is quite
          long and detailed, but I hope it is complete. I will start with a problem
          analysis and then continue with solution proposal.
          
          Requirements:
          
          Reading through this issue, I came to a conclusion, that it is requested to be
          able to "delete specific revisions of specific files/paths, as if they never
          happened". This includes removing the entire history of one file/path, or
          removing entire revision.
          
          Optionally, it is requested, when deleting all revisions of specific file/path,
          to delete all files derived (moved/copied) from this file (e.g. delete all tags
          of the file/path). Let's call this cascading.
          
          QaA:
          
          What happens when a specific revision of a specific file is removed? It means
          that the final results must be the same as if the associated commit never
          happened. However, all other commits (before or after) did happen. Or in other
          words, it means that the removee revision is to be merged with the next revision
          of the same file. If we have three revisions, A, B and C, of the same file and B
          is removed, then the results will be revision A and revision BC. When
          checkouting revision B, we get A, when checkouting C, we get C.
          
          What happens if we remove a revision, where a file is added? It means that the
          final results must be the same, as if the file was added later, when its next
          revision occur. This happens if we add a file in wc, but forget to commit it. We
          edit the file a bit and then commit it.
          
          What happens if we delete a revision, where a file is removed? It means that the
          final results must be the same, as if the file was never deleted. However, if in
          some later revision a new file is added at the same path, we must then delete
          the deleted file and start a brand new file. The old file is replaced by a new
          one. Otherwise the history would divert. Note: It is appropriate to delete the
          deleted file in the same revision where the new file is started. Why? Because
          there is the point, where the deleted file now dies out, because it was not
          supposed to be deleted any sooner.
          
          What happens if we delete a range of revisions, where a file is added and then
          removed? The file never existed. So the results for any revision inside the
          range can be concluded from above: the file was not added until it was removed.
          
          What happned if we delete a revision, where a file is moved/copied? The results
          must be the same as if the file was moved/copied later, when its next revision
          at the new place occur. This is similar to adding a new file and/or deleting an
          old one.
          
          What happens to the working copies? They might be invalidated. Some commits did
          not occur yet, therefore the files may contain data that are not yet available.
          
          Analysis:
          
          Everything above can be done in two steps.
          
          First step is to mark requested revisions of requested files/paths as
          obliterated. This is simple. No data is removed, filesystem is not rewritten.
          However, from this point, all checkouts and updates must return the new data
          according to the QaA section above, which can be done within current
          architecture. Of course, this step does not physically remove any confidential
          info and frees no space, but can be done quickly without locking the repository.
          
          When dealing with invalidated working copies, we need to know, that some
          revisions were modified in the meantime. In other words, a working copy will be
          up-to-date, when the revision number is the same AND when the time of the last
          obliteration will be the same. When the time of the last obliteration is
          different, we need to update the working copy. This means comparing the
          server-side revision with cached working base with modified working copy. This
          seems to be a problem, because we have no code for it yet (afaik). Or we can
          give it the deep six and force a checkout when it happens. Well.. :))
          
          The second step is to physically remove all redundant data. This means to
          compact the filesystem. To rewrite it from BASE to HEAD according to the QaA
          section above. This may seem to be difficult, however, if the repository is
          locked meanwhile, it can be done with the code currently available (building a
          brand new repository from the old one). This step removes any confidential info
          and reduces the space taken by the repository.
          
          Note that can be ignored for now: There is no need for a full-lock in the second
          step. New commits may actually happen in the meantime. Just the old revisions
          may not be modified. So the lock may be progressive as the rewrite progresses.
          This leads to two locks. One that locks the HEAD of the repository during a new
          comit and one that (progressively) locks revisions from the BASE, while the
          rewrite is being done.
          

          Original comment by jsimlo

          Show
          subversion-importer Subversion Importer added a comment - As a senior software designer and a junior software architect with some knowldge of svn, I would like to contribute with my analysis of the request. It is quite long and detailed, but I hope it is complete. I will start with a problem analysis and then continue with solution proposal. Requirements: Reading through this issue, I came to a conclusion, that it is requested to be able to "delete specific revisions of specific files/paths, as if they never happened". This includes removing the entire history of one file/path, or removing entire revision. Optionally, it is requested, when deleting all revisions of specific file/path, to delete all files derived (moved/copied) from this file (e.g. delete all tags of the file/path). Let's call this cascading. QaA: What happens when a specific revision of a specific file is removed? It means that the final results must be the same as if the associated commit never happened. However, all other commits (before or after) did happen. Or in other words, it means that the removee revision is to be merged with the next revision of the same file. If we have three revisions, A, B and C, of the same file and B is removed, then the results will be revision A and revision BC. When checkouting revision B, we get A, when checkouting C, we get C. What happens if we remove a revision, where a file is added? It means that the final results must be the same, as if the file was added later, when its next revision occur. This happens if we add a file in wc, but forget to commit it. We edit the file a bit and then commit it. What happens if we delete a revision, where a file is removed? It means that the final results must be the same, as if the file was never deleted. However, if in some later revision a new file is added at the same path, we must then delete the deleted file and start a brand new file. The old file is replaced by a new one. Otherwise the history would divert. Note: It is appropriate to delete the deleted file in the same revision where the new file is started. Why? Because there is the point, where the deleted file now dies out, because it was not supposed to be deleted any sooner. What happens if we delete a range of revisions, where a file is added and then removed? The file never existed. So the results for any revision inside the range can be concluded from above: the file was not added until it was removed. What happned if we delete a revision, where a file is moved/copied? The results must be the same as if the file was moved/copied later, when its next revision at the new place occur. This is similar to adding a new file and/or deleting an old one. What happens to the working copies? They might be invalidated. Some commits did not occur yet, therefore the files may contain data that are not yet available. Analysis: Everything above can be done in two steps. First step is to mark requested revisions of requested files/paths as obliterated. This is simple. No data is removed, filesystem is not rewritten. However, from this point, all checkouts and updates must return the new data according to the QaA section above, which can be done within current architecture. Of course, this step does not physically remove any confidential info and frees no space, but can be done quickly without locking the repository. When dealing with invalidated working copies, we need to know, that some revisions were modified in the meantime. In other words, a working copy will be up-to-date, when the revision number is the same AND when the time of the last obliteration will be the same. When the time of the last obliteration is different, we need to update the working copy. This means comparing the server-side revision with cached working base with modified working copy. This seems to be a problem, because we have no code for it yet (afaik). Or we can give it the deep six and force a checkout when it happens. Well.. :)) The second step is to physically remove all redundant data. This means to compact the filesystem. To rewrite it from BASE to HEAD according to the QaA section above. This may seem to be difficult, however, if the repository is locked meanwhile, it can be done with the code currently available (building a brand new repository from the old one). This step removes any confidential info and reduces the space taken by the repository. Note that can be ignored for now: There is no need for a full-lock in the second step. New commits may actually happen in the meantime. Just the old revisions may not be modified. So the lock may be progressive as the rewrite progresses. This leads to two locks. One that locks the HEAD of the repository during a new comit and one that (progressively) locks revisions from the BASE, while the rewrite is being done. Original comment by jsimlo
          Hide
          subversion-importer Subversion Importer added a comment -

          Ahh, I guess I could have mentioned that the two steps are completely
          independent from each other in the means of both their development and their
          usage. Though the second one is useless without the first one..
          

          Original comment by jsimlo

          Show
          subversion-importer Subversion Importer added a comment - Ahh, I guess I could have mentioned that the two steps are completely independent from each other in the means of both their development and their usage. Though the second one is useless without the first one.. Original comment by jsimlo
          Hide
          sussman Ben Collins-Sussman added a comment -

          Hi Juraj,
          
          It's great to see all your design thoughts.  However, you've got the wrong forum.  As a developer 
          community, we find the issue tracker awkward to have design discussions in.  The best thing to do is re-
          post your design to the dev@subversion.tigris.org list, and have our chats there.
          
          That said:  we're all busy scrambling right now to finish up svn 1.5 features (merge-tracking in particular), 
          so we may not quite have the bandwidth to dig into your design at the moment.  :-)
          
          
          

          Show
          sussman Ben Collins-Sussman added a comment - Hi Juraj, It's great to see all your design thoughts. However, you've got the wrong forum. As a developer community, we find the issue tracker awkward to have design discussions in. The best thing to do is re- post your design to the dev@subversion.tigris.org list, and have our chats there. That said: we're all busy scrambling right now to finish up svn 1.5 features (merge-tracking in particular), so we may not quite have the bandwidth to dig into your design at the moment. :-)
          Hide
          subversion-importer Subversion Importer added a comment -

          Nevetheless, the first step would be fine for me. It would prevent the
          obliterated file from being retieved by users. So it fulfils the data security
          motivation for obliterate.
          
          The second step which seems to be much more difficult is IMHO of lower priority.
          It belongs the the repository maintenance procedure.
          

          Original comment by beweiche

          Show
          subversion-importer Subversion Importer added a comment - Nevetheless, the first step would be fine for me. It would prevent the obliterated file from being retieved by users. So it fulfils the data security motivation for obliterate. The second step which seems to be much more difficult is IMHO of lower priority. It belongs the the repository maintenance procedure. Original comment by beweiche
          Hide
          subversion-importer Subversion Importer added a comment -

          Nevetheless, the first step would be fine for me. It would prevent the
          obliterated file from being retieved by users. So it fulfils the data security
          motivation for obliterate.
          
          The second step which seems to be much more difficult is IMHO of lower priority.
          It belongs the the repository maintenance procedure.
          

          Original comment by beweiche

          Show
          subversion-importer Subversion Importer added a comment - Nevetheless, the first step would be fine for me. It would prevent the obliterated file from being retieved by users. So it fulfils the data security motivation for obliterate. The second step which seems to be much more difficult is IMHO of lower priority. It belongs the the repository maintenance procedure. Original comment by beweiche
          Hide
          subversion-importer Subversion Importer added a comment -

          > Use cases:
          > 1) Removal of confidential information
          > 2) Removal of obsolete information
          > 3) Removal of tags, branches, or trees that have moved to other repositories.
          
          I just removed a 524MB mpeg file from one of our Subversion repositories that
          got accidentally committed. Please, give me some place like the Bounty Source
          (https://www.bountysource.com/) where I can dump my $50 to have this obliterate
          thing finally implemented and I can stop worrying about when our corporate
          servers will need replacement for extra hard drives and backup capacity.
          
          If everyone who cares about this puts a 50 or 100 dollar bounty on it you'll
          have a month's salary to work on it raised within a week, and to us corporate
          users $50 is cheaper than getting a sysadmin to fix it in 1 or 2 hours, even
          ignoring the fact that the repository would be offline during that period.
          

          Original comment by curry

          Show
          subversion-importer Subversion Importer added a comment - > Use cases: > 1) Removal of confidential information > 2) Removal of obsolete information > 3) Removal of tags, branches, or trees that have moved to other repositories. I just removed a 524MB mpeg file from one of our Subversion repositories that got accidentally committed. Please, give me some place like the Bounty Source (https://www.bountysource.com/) where I can dump my $50 to have this obliterate thing finally implemented and I can stop worrying about when our corporate servers will need replacement for extra hard drives and backup capacity. If everyone who cares about this puts a 50 or 100 dollar bounty on it you'll have a month's salary to work on it raised within a week, and to us corporate users $50 is cheaper than getting a sysadmin to fix it in 1 or 2 hours, even ignoring the fact that the repository would be offline during that period. Original comment by curry
          Hide
          subversion-importer Subversion Importer added a comment -

          My proposed solution is to be able to 'pack' two or more revisions into one.
          
          This should be possible using svnadmin and for any valid revision range. Instead of a 
          huge lockdown it should be possible to simply pack two revisions into one in a per-
          needed basis.
          
          For example:
          
          Revisions 1:10000 in a 10500 revisions repository are packed into one revision, the 
          revision 10000.
          
          The first 9999 revisions then are just empty ones (if this is not possible then using 
          a dummy file with the revision number), and the revision 10000 is a snapshot of the 
          repository tree in that revision but with the history in the range 1:9999 lost.
          
          The files that were deleted in revisions 1:9999 would be lost too.
          
          Revision 10000 would be then the first useful revision of the repository.
          
          Advantages: 
          - The last revision numbers and changesets would remain the same. No working copies 
          would be made invalid (except for the extremely outdated, which no one is using 
          anyway). It will be transparent to end users.
          - Repositories could then be 'packed' by repository admins. Disk usage would then be 
          much lower, disk space issues solved.
          - As deleted files would vanish from history, this could also fix privacy concerns.
          

          Original comment by nicolay77

          Show
          subversion-importer Subversion Importer added a comment - My proposed solution is to be able to 'pack' two or more revisions into one. This should be possible using svnadmin and for any valid revision range. Instead of a huge lockdown it should be possible to simply pack two revisions into one in a per- needed basis. For example: Revisions 1:10000 in a 10500 revisions repository are packed into one revision, the revision 10000. The first 9999 revisions then are just empty ones (if this is not possible then using a dummy file with the revision number), and the revision 10000 is a snapshot of the repository tree in that revision but with the history in the range 1:9999 lost. The files that were deleted in revisions 1:9999 would be lost too. Revision 10000 would be then the first useful revision of the repository. Advantages: - The last revision numbers and changesets would remain the same. No working copies would be made invalid (except for the extremely outdated, which no one is using anyway). It will be transparent to end users. - Repositories could then be 'packed' by repository admins. Disk usage would then be much lower, disk space issues solved. - As deleted files would vanish from history, this could also fix privacy concerns. Original comment by nicolay77
          Hide
          cmpilato C. Michael Pilato added a comment -

          Nice idea, Nicolay, but I'm not entirely convinced that admins will necessarily
          see the disk usage benefits of this that you claim.  Consider the common source
          code repository with nightly tags made.  Today, those tags cost next-to-nothing
          in terms of disk usage -- they are empty deltas against the directory they were
          copied from.  But the minute you flatten them out (and there's nothing to be an
          empty delta against), you've just caused a massive explosion in disk usage. 
          Maybe we could make some of this back by implementing (in the FS DAG subsystem)
          string/representation sharing between nodes, or deltas against other nodes in
          the same revision, but there lies a whole hunk of code complexity behind that door.
          

          Show
          cmpilato C. Michael Pilato added a comment - Nice idea, Nicolay, but I'm not entirely convinced that admins will necessarily see the disk usage benefits of this that you claim. Consider the common source code repository with nightly tags made. Today, those tags cost next-to-nothing in terms of disk usage -- they are empty deltas against the directory they were copied from. But the minute you flatten them out (and there's nothing to be an empty delta against), you've just caused a massive explosion in disk usage. Maybe we could make some of this back by implementing (in the FS DAG subsystem) string/representation sharing between nodes, or deltas against other nodes in the same revision, but there lies a whole hunk of code complexity behind that door.
          Hide
          subversion-importer Subversion Importer added a comment -

          I see that this issue has been kicked around for over 6 years. I use Perforce
          and a hotshot programmer suggested I was dumb for paying for it instead of using
          the free Subversion software. So I'm here checking it out. 
          
          Obliterate is in Perforce and for that reason alone, I'm not willing to consider
          Subversion at this time, sorry.
          
          From what I read here, I think the reason the obliterate function hasn't been
          done is that the basic structure of Subversion makes it very very difficult.
          
          The way it works in Perforce is that each file in the repository is tracked
          separately thus allowing you to obliterate one or many (with wild cards) files
          with one submission. To safeguard the system, the file(s) must be deleted (which
          is just a flag) and the user needs a higher level of permission which means only
          specific users can do it. So files can be deleted which removes them from the
          user's workspace but not the repository and then if no one complains they can be
          obliterated as desired. 
          
          This satisfies some but not all the "requirements" posted here, for example, it
          removes the file and recovers space but does not retain the most recent
          version(s). So if some one submits a really bad version of a program, you need
          to save the last good version outside of Perforce, delete and obliterate the
          file (all versions), then put the saved version back as a new file. I'm pretty
          sure the changelists will retain information about the file, ie when it was
          added, updated, deleted and obliterated.
          
          In the case where a bunch of crap is accidentally submitted, it's easy to get
          rid of.
          
          I hope my comments are taken as constructive. As others have stated, this is a
          very desirable feature.
          

          Original comment by pcrerun

          Show
          subversion-importer Subversion Importer added a comment - I see that this issue has been kicked around for over 6 years. I use Perforce and a hotshot programmer suggested I was dumb for paying for it instead of using the free Subversion software. So I'm here checking it out. Obliterate is in Perforce and for that reason alone, I'm not willing to consider Subversion at this time, sorry. From what I read here, I think the reason the obliterate function hasn't been done is that the basic structure of Subversion makes it very very difficult. The way it works in Perforce is that each file in the repository is tracked separately thus allowing you to obliterate one or many (with wild cards) files with one submission. To safeguard the system, the file(s) must be deleted (which is just a flag) and the user needs a higher level of permission which means only specific users can do it. So files can be deleted which removes them from the user's workspace but not the repository and then if no one complains they can be obliterated as desired. This satisfies some but not all the "requirements" posted here, for example, it removes the file and recovers space but does not retain the most recent version(s). So if some one submits a really bad version of a program, you need to save the last good version outside of Perforce, delete and obliterate the file (all versions), then put the saved version back as a new file. I'm pretty sure the changelists will retain information about the file, ie when it was added, updated, deleted and obliterated. In the case where a bunch of crap is accidentally submitted, it's easy to get rid of. I hope my comments are taken as constructive. As others have stated, this is a very desirable feature. Original comment by pcrerun
          Hide
          subversion-importer Subversion Importer added a comment -

          As I understood, the biggest problem which prevents implementing this feature 
          is a case, when someone needs to obliterate one revision of file and keep all 
          other revisions. However, it's still possible to implement "obliterate" which 
          will completely wipe file and all it's revisions, as it never existed (and this 
          will be enough for many people who just want to remove files which shouldn't be 
          in repository)
          Or alternatively, to wipe all revisions of file after the specified revision.
          Implementing such feature shouldn't be very difficult.
          What do you think?
          

          Original comment by afaber

          Show
          subversion-importer Subversion Importer added a comment - As I understood, the biggest problem which prevents implementing this feature is a case, when someone needs to obliterate one revision of file and keep all other revisions. However, it's still possible to implement "obliterate" which will completely wipe file and all it's revisions, as it never existed (and this will be enough for many people who just want to remove files which shouldn't be in repository) Or alternatively, to wipe all revisions of file after the specified revision. Implementing such feature shouldn't be very difficult. What do you think? Original comment by afaber
          Hide
          subversion-importer Subversion Importer added a comment -

          > Or alternatively, to wipe all revisions of file after the specified revision.
          > Implementing such feature shouldn't be very difficult. What do you think?
          
          That is not necessarily true. The moment you start to obliterate anything in any
          way (delete, add or modify), all working copies beyond such obliterated revision
          get dirty with no way of knowing. Svn would have to add (at least) some kind of
          versioning number to the revisions. Though your proposal is much more simple
          from the full set of use cases, it might require to modify some of the very base
          code.
          

          Original comment by jsimlo

          Show
          subversion-importer Subversion Importer added a comment - > Or alternatively, to wipe all revisions of file after the specified revision. > Implementing such feature shouldn't be very difficult. What do you think? That is not necessarily true. The moment you start to obliterate anything in any way (delete, add or modify), all working copies beyond such obliterated revision get dirty with no way of knowing. Svn would have to add (at least) some kind of versioning number to the revisions. Though your proposal is much more simple from the full set of use cases, it might require to modify some of the very base code. Original comment by jsimlo
          Hide
          subversion-importer Subversion Importer added a comment -

          >> Or alternatively, to wipe all revisions of file after the specified revision.
          >> Implementing such feature shouldn't be very difficult. What do you think?
          
          > That is not necessarily true. The moment you start to obliterate anything in
          > any way (delete, add or modify), all working copies beyond such obliterated
          > revision get dirty with no way of knowing. Svn would have to add (at least)
          > some kind of versioning number to the revisions. Though your proposal is much
          > more simple from the full set of use cases, it might require to modify some of
          > the very base code.
          
          It is this very thinking that has stalled this issue since 2001.  Right now,
          this very day, you have the ability to do just that, a filtering operation that
          will break all working copies - it simply has a massively impractical
          UI/workflow.  The solution is simply to provide a better means of performing the
          same results as dumping a repo, filtering it and reloading it into a repo with
          the same id and replacing the old repo.  Sure a more comprehensive solution
          would be nice but there is also no momentum towards such a solution or in fact
          any real demand for one.
          
          Please, please consider this issue simply a request to support locking a repo while
          
          A path or selection of paths are filtered from existing revisions
          and/or A specific revision is removed
          and/or empty revisions are filtered out (renumbering subsequent revisions)
          
          Trying to make the solution more than is requested is exactly why there is no
          progress. Not making it into 1.5 is about as post 1.0 as people can stand.
          

          Original comment by talden

          Show
          subversion-importer Subversion Importer added a comment - >> Or alternatively, to wipe all revisions of file after the specified revision. >> Implementing such feature shouldn't be very difficult. What do you think? > That is not necessarily true. The moment you start to obliterate anything in > any way (delete, add or modify), all working copies beyond such obliterated > revision get dirty with no way of knowing. Svn would have to add (at least) > some kind of versioning number to the revisions. Though your proposal is much > more simple from the full set of use cases, it might require to modify some of > the very base code. It is this very thinking that has stalled this issue since 2001. Right now, this very day, you have the ability to do just that, a filtering operation that will break all working copies - it simply has a massively impractical UI/workflow. The solution is simply to provide a better means of performing the same results as dumping a repo, filtering it and reloading it into a repo with the same id and replacing the old repo. Sure a more comprehensive solution would be nice but there is also no momentum towards such a solution or in fact any real demand for one. Please, please consider this issue simply a request to support locking a repo while A path or selection of paths are filtered from existing revisions and/or A specific revision is removed and/or empty revisions are filtered out (renumbering subsequent revisions) Trying to make the solution more than is requested is exactly why there is no progress. Not making it into 1.5 is about as post 1.0 as people can stand. Original comment by talden
          Hide
          kfogel Karl Fogel added a comment -

          No, that is not why this issue is stalled.  Everyone, as far as I know, is
          willing to punt on the question of working copy consequences for 'obliterate',
          and treat it as simply a repository operation.
          
          The feature is stalled because there are several different use cases -- several
          different things people mean when they say "obliterate".  Above, you describe
          one of them (and it's not a bad solution, either), but you inevitably hand-wave
          on a lot of the details, such as: if I specify that a given path is to be
          removed from every revision, do I also mean that copies based on that path
          should be removed?  Etc.
          
          Yes, in your solution, we could punt on the copies question, but the point is
          that whatever we do, it has to be well-thought-out enough that it's
          forward-compatible with future enhancements.  We can't just implement the first,
          easiest, obviousest thing we think of --  we have to make sure it doesn't block
          off likely avenues of improvement later.  Feature design is hard, there's no way
          around that.
          
          Of course these questions can all be resolved, I'm not saying they're
          showstoppers.  But it's not going to happen before 1.5, and adding more
          commentary to this issue isn't going to speed it up either.
          
          Please, folks.  This issue needs a requirements and design discussion, on the
          mailing list, not in the issue tracker, and (IMHO) we don't have the bandwidth
          for that discussion until 1.5 is out.  If someone wants to start it anyway, I
          certainly won't try shut it down; I'm just saying I think such a thread would
          get more attention after 1.5, that's all.
          
          In any case, the next comment in this issue, whenever it comes, ought to be a
          pointer to a mailing list thread :-).
          

          Show
          kfogel Karl Fogel added a comment - No, that is not why this issue is stalled. Everyone, as far as I know, is willing to punt on the question of working copy consequences for 'obliterate', and treat it as simply a repository operation. The feature is stalled because there are several different use cases -- several different things people mean when they say "obliterate". Above, you describe one of them (and it's not a bad solution, either), but you inevitably hand-wave on a lot of the details, such as: if I specify that a given path is to be removed from every revision, do I also mean that copies based on that path should be removed? Etc. Yes, in your solution, we could punt on the copies question, but the point is that whatever we do, it has to be well-thought-out enough that it's forward-compatible with future enhancements. We can't just implement the first, easiest, obviousest thing we think of -- we have to make sure it doesn't block off likely avenues of improvement later. Feature design is hard, there's no way around that. Of course these questions can all be resolved, I'm not saying they're showstoppers. But it's not going to happen before 1.5, and adding more commentary to this issue isn't going to speed it up either. Please, folks. This issue needs a requirements and design discussion, on the mailing list, not in the issue tracker, and (IMHO) we don't have the bandwidth for that discussion until 1.5 is out. If someone wants to start it anyway, I certainly won't try shut it down; I'm just saying I think such a thread would get more attention after 1.5, that's all. In any case, the next comment in this issue, whenever it comes, ought to be a pointer to a mailing list thread :-).
          Hide
          ziesemer Mark A. Ziesemer added a comment -

          Please see also:
          http://blogs.quintor.nl/bbottema/2008/03/01/subversion-obliterate-the-forgotten-feature/
          

          Show
          ziesemer Mark A. Ziesemer added a comment - Please see also: http://blogs.quintor.nl/bbottema/2008/03/01/subversion-obliterate-the-forgotten-feature/
          Hide
          kfogel Karl Fogel added a comment -

          Yeah, that blogger apparently doesn't check his moderation queue, because my two
          responses are still not appearing days after I left them.  But hey, this issue
          will do for a soapbox :-).  Here's what I wrote:
          
          ###
          
          Thank you for drawing more attention to this problem (maybe it will encourage
          people to try to solve it). I think you may have misanalyzed, though.
          
          The delay is not due to the supposed complexity of the codebase. Subversion’s
          repository code is not *that* horrifyingly complex :-). I mean, sure, you need
          to know a thing or two, but grokking the code is not the real obstacle.
          
          The real obstacle is that no one has stepped up to design this and see it
          through to completion. Figuring out the desired behavior (with enough
          specificity to actually implement) is what’s hard here, not the implementation
          itself.
          
          You misunderstood the citation that said “… the ability to permanently remove a
          file, dir, or revision from history forever. This means rewriting the whole
          filesystem; a big deal. …” What that commenter meant was that Subversion would
          have to rewrite repository data when an ‘obliterate’ is performed, not that
          programmers would have to rewrite the filesystem code in order to implement
          ‘obliterate’! (Even then it’s not completely true, and in any case it was
          carelessly worded; sorry you got misled.)
          
          Why haven’t we implemented ‘obliterate’? Well, because you haven’t :-). That is
          to say, there is no “we” here; work gets done by those who do it. Each person
          working on Subversion has their own reasons for doing it: some are paid
          full-time, some are paid part-time, and some are volunteering their time. In the
          first two categories, you can expect that those paying the bills will have
          something to say about their developers’ priorities, and so far they haven’t
          seen ‘obliterate’ as a compelling enough feature to fund. The volunteers
          apparently have other itches to scratch.
          
          But there can be new volunteers! We’d all like to see the feature happen. We’re
          willing to help with the design. (It’s true that I recently recommended
          discussion wait until after 1.5, but that’s a temporary thing, and certainly
          hasn’t applied for the last seven years.)
          
          ###
          
          I do think this comment was gratuitous:
          
          “There seems little we can do. Try downloading the sourcecode and see if you can
          make sense of it all. Be aware, even the core developers are afraid to touch it
          without refined documentation backup them up. Maybe go along with the bounty
          idea and attract Cowboy Coders that lack the compulsive need for documentation.”
          
          What are you talking about? The core developers, and even occasional patch
          contributors, touch that code all the time. Just run ’svn log’ on our repository
          to see. The idea that fear of changing the code is somehow responsible for the
          delay is… how can I say this politely? … absolutely wrong.
          
          The code is not the obstacle here, the behavioral specification is. (There have
          recently been some good suggestions for incremental implementation requiring a
          less detailed spec; they would still require fleshing out, but they hold promise.)
          
          

          Show
          kfogel Karl Fogel added a comment - Yeah, that blogger apparently doesn't check his moderation queue, because my two responses are still not appearing days after I left them. But hey, this issue will do for a soapbox :-). Here's what I wrote: ### Thank you for drawing more attention to this problem (maybe it will encourage people to try to solve it). I think you may have misanalyzed, though. The delay is not due to the supposed complexity of the codebase. Subversion’s repository code is not *that* horrifyingly complex :-). I mean, sure, you need to know a thing or two, but grokking the code is not the real obstacle. The real obstacle is that no one has stepped up to design this and see it through to completion. Figuring out the desired behavior (with enough specificity to actually implement) is what’s hard here, not the implementation itself. You misunderstood the citation that said “… the ability to permanently remove a file, dir, or revision from history forever. This means rewriting the whole filesystem; a big deal. …” What that commenter meant was that Subversion would have to rewrite repository data when an ‘obliterate’ is performed, not that programmers would have to rewrite the filesystem code in order to implement ‘obliterate’! (Even then it’s not completely true, and in any case it was carelessly worded; sorry you got misled.) Why haven’t we implemented ‘obliterate’? Well, because you haven’t :-). That is to say, there is no “we” here; work gets done by those who do it. Each person working on Subversion has their own reasons for doing it: some are paid full-time, some are paid part-time, and some are volunteering their time. In the first two categories, you can expect that those paying the bills will have something to say about their developers’ priorities, and so far they haven’t seen ‘obliterate’ as a compelling enough feature to fund. The volunteers apparently have other itches to scratch. But there can be new volunteers! We’d all like to see the feature happen. We’re willing to help with the design. (It’s true that I recently recommended discussion wait until after 1.5, but that’s a temporary thing, and certainly hasn’t applied for the last seven years.) ### I do think this comment was gratuitous: “There seems little we can do. Try downloading the sourcecode and see if you can make sense of it all. Be aware, even the core developers are afraid to touch it without refined documentation backup them up. Maybe go along with the bounty idea and attract Cowboy Coders that lack the compulsive need for documentation.” What are you talking about? The core developers, and even occasional patch contributors, touch that code all the time. Just run ’svn log’ on our repository to see. The idea that fear of changing the code is somehow responsible for the delay is… how can I say this politely? … absolutely wrong. The code is not the obstacle here, the behavioral specification is. (There have recently been some good suggestions for incremental implementation requiring a less detailed spec; they would still require fleshing out, but they hold promise.)
          Hide
          bbottema Benny Bottema added a comment -

          I apologize, I have to get used to the blogging system a little bit (as you 
          might've guessed from the posts :). Thank you for clarifying some issues I 
          mentioned (you can probably delete these posts from the tracker now?).
          

          Show
          bbottema Benny Bottema added a comment - I apologize, I have to get used to the blogging system a little bit (as you might've guessed from the posts :). Thank you for clarifying some issues I mentioned (you can probably delete these posts from the tracker now?).
          Hide
          kfogel Karl Fogel added a comment -

          Thanks.  No way to delete posts from the tracker, sadly, but I'm glad to see the
          comment thread on your original post.
          
          

          Show
          kfogel Karl Fogel added a comment - Thanks. No way to delete posts from the tracker, sadly, but I'm glad to see the comment thread on your original post.
          Hide
          kfogel Karl Fogel added a comment -

          See this thread for an implementation proposal:
          
             http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=137319
             From: Karl Fogel <kfogel@red-bean.com>
             To: dev@subversion.tigris.org
             Subject: [PROPOSAL] how to implement 'svn obliterate'
             Date: Wed, 16 Apr 2008 16:23:36 -0400
             Message-ID: <87hce1pn07.fsf@red-bean.com>
          
          

          Show
          kfogel Karl Fogel added a comment - See this thread for an implementation proposal: http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=137319 From: Karl Fogel <kfogel@red-bean.com> To: dev@subversion.tigris.org Subject: [PROPOSAL] how to implement 'svn obliterate' Date: Wed, 16 Apr 2008 16:23:36 -0400 Message-ID: <87hce1pn07.fsf@red-bean.com>
          Hide
          kfogel Karl Fogel added a comment -

          See http://svn.collab.net/repos/svn/trunk/notes/obliterate/ for what's up.
          

          Show
          kfogel Karl Fogel added a comment - See http://svn.collab.net/repos/svn/trunk/notes/obliterate/ for what's up.
          Hide
          subversion-importer Subversion Importer added a comment -

          best and first
          

          Original comment by pishro

          Show
          subversion-importer Subversion Importer added a comment - best and first Original comment by pishro
          Hide
          kfogel Karl Fogel added a comment -

          There's been more discussion on this feature, with some behavioral specification
          work I think.  Some places to check:
          
           
          http://svn.collab.net/repos/svn/trunk/notes/obliterate/obliterate-functional-spec.txt
          
          Also, mail threads:
          
           
          http://www.nabble.com/svn-obliterate:-The-four-types-of-obliteration-td22229950.html
            From: Magnus Torfason
            Date: 2 Feb 26, 2009; 12:44pm
            Subject: svn obliterate: The four types of obliteration
          
          and
          
            http://svn.haxx.se/dev/archive-2009-02/0642.shtml
            From: Magnus Torfason <zulutime.net_at_gmail.com>
            Date: Thu, 26 Feb 2009 13:18:59 -0500
            Subject: svn obliterate: Keeping it real (or: What has already been implemented?)
          
          

          Show
          kfogel Karl Fogel added a comment - There's been more discussion on this feature, with some behavioral specification work I think. Some places to check: http://svn.collab.net/repos/svn/trunk/notes/obliterate/obliterate-functional-spec.txt Also, mail threads: http://www.nabble.com/svn-obliterate:-The-four-types-of-obliteration-td22229950.html From: Magnus Torfason Date: 2 Feb 26, 2009; 12:44pm Subject: svn obliterate: The four types of obliteration and http://svn.haxx.se/dev/archive-2009-02/0642.shtml From: Magnus Torfason <zulutime.net_at_gmail.com> Date: Thu, 26 Feb 2009 13:18:59 -0500 Subject: svn obliterate: Keeping it real (or: What has already been implemented?)
          Hide
          julianfoad Julian Foad added a comment -

          SUMMARY OF POINTS RAISED SO FAR
          
          I have compiled the following list of all the significant points, from a feature
          design perspective, that have been tracked in this issue so far.
          
          I am not judging what has been said, only providing a quick reference to it. I
          am not including what's been said in email and elsewhere. It is not a
          comprehensive index to all comments, and people's names are given in brackets to
          assist in finding an original comment and not to indicate credit for the ideas.
          
          
          OBLITERATE WHAT?
          
            a. A node-rev. (Possible restriction: and all copies thereof [SDaugherty].)
          
            b. A node through all revisions. (Including all copies thereof? Just along a
          single segment of its history? A path rather than a node?)
          
            c. A whole revision. (Possible restriction: and all subsequent revisions.)
          
            d. A patch. "Replay patches" onto a reverted state of the repos [BBehlendorf].
          
          
          REASONS
          
          * "Security" - delete sensitive data that was accidentally committed.
          
            a. From view by svn clients.
          
            b. From view by server admins.
          
          This need, especially (a), is time-critical. "Better ... for all revisions of a
          file to be temporarily unavailable than to unnecessarily delay removal"
          [SDaugherty].
          
          * "Maintenance" - delete large data that is no longer necessary, to save disk
          space on the server.
          
            a. Delete large files that were accidentally committed [JRobbins, WHartshorn].
          
            b. Delete intermediate revisions of a sequence of changes that is no longer
          interesting in detail, leaving only a few significant revisions [THarning].
          
            c. Delete old revisions of large files that are routinely committed but only
          required for a short while (one or a few revisions) [BTutt, BWebster, ABarbati].
          
            d. Splitting the repository - moving certain projects, branches, tags, etc. to
          another repository [KBenton].
          
          
          ON-LINE/OFF-LINE
          
            * client-side (e.g. in "svn")
          
            * server-side (e.g. in "svnadmin")
          
          "You'd add it to libsvn_ra if you could." [BTutt]. Complexity of re-deltifying
          and existing WCs referring to invalid "node ids" [BCollins-Sussman]. "Remove the
          node contents and mark the node 'dead' (a new state)..." [BCibej]
          
          "SVN is a client/server app. Why shouldn't I be able to remotely administer it?"
          [BTutt]
          
          "We can & should do it through RA." [GStein]
          
          Dump-load approach: repos down-time, and slow.
          
          Client-side approach "is a security concern" [SDaugherty].
          
          Should be client-side because a system's "admin cost" is a significant concern
          [TRupert].
          
          
          METADATA (REV-NUMS AND REV-PROPS)
          
          "Leaving a log entry for the obliteration should be at the obliterator's
          discretion; without a log entry, all trace of the path can (should? or yet
          another option?) be wiped" [TTrias]. I'm not sure if this refers to creating a
          new rev with a log message, or what.
          
          Revision numbers - keep even if the revision is empty, or renumber subsequent
          revisions [Talden]?
          
          
          EFFECT ON CLIENT; WC INTEGRITY
          
          Let the files left in place, but content replaced with an explanatory message
          [JRobbins].
          
          "Erase all revisions from the faulty one onwards - and let clients sort
          themselves out manually" [BBucksch].
          
          To detect brokenness, WC could store and compare "time of last obliteration"
          [JSimlovic].
          
          
          OPTIONAL FUNCTIONALITY
          
          * Dump what's being obliterated [SDaugherty].
          
          
          WHAT CAN SVNDUMPFILTER DO ALREADY?
          
            a. NOT handle moves between excluded and included sections of the repository
          [JeffC]?
          
          
          DUPLICATE/RELATED ISSUES IN TRACKER
          
          #1260 - wanting 'obliterate' for server disk space recovery.
          #1848 - unrelated, not really a dup.
          
          
          PEOPLE WHO WERE POTENTIALLY OFFERING FUNDING
          
          Tim Jackson
          Brent Webster
          Auriel Manolson
          Niels Keurentjes
          
          
          PEOPLE WHO WERE SPECIFYING/DESIGNING
          
          Juraj Simlovic
          Karl Fogel
          Magnus Torfason
          
          
          SEE ALSO
          
          (Gone -
          <http://svn.collab.net/repos/svn/trunk/contrib/server-side/svn-obliterate.py>.)
          
          Blog/rant:
          <http://blogs.quintor.nl/bbottema/2008/03/01/subversion-obliterate-the-forgotten-feature/>
          
          Implementation proposal email (KFogel):
          <http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=137319>
          
          Functional spec in trunk:
          <http://svn.collab.net/repos/svn/trunk/notes/obliterate/obliterate-functional-spec.txt>.
          
          Design email (MTorfason):
          <http://www.nabble.com/svn-obliterate:-The-four-types-of-obliteration-td22229950.html>
          
          Investigative email (MTorfason):
          <http://svn.haxx.se/dev/archive-2009-02/0642.shtml>
          
          
          

          Show
          julianfoad Julian Foad added a comment - SUMMARY OF POINTS RAISED SO FAR I have compiled the following list of all the significant points, from a feature design perspective, that have been tracked in this issue so far. I am not judging what has been said, only providing a quick reference to it. I am not including what's been said in email and elsewhere. It is not a comprehensive index to all comments, and people's names are given in brackets to assist in finding an original comment and not to indicate credit for the ideas. OBLITERATE WHAT? a. A node-rev. (Possible restriction: and all copies thereof [SDaugherty].) b. A node through all revisions. (Including all copies thereof? Just along a single segment of its history? A path rather than a node?) c. A whole revision. (Possible restriction: and all subsequent revisions.) d. A patch. "Replay patches" onto a reverted state of the repos [BBehlendorf]. REASONS * "Security" - delete sensitive data that was accidentally committed. a. From view by svn clients. b. From view by server admins. This need, especially (a), is time-critical. "Better ... for all revisions of a file to be temporarily unavailable than to unnecessarily delay removal" [SDaugherty]. * "Maintenance" - delete large data that is no longer necessary, to save disk space on the server. a. Delete large files that were accidentally committed [JRobbins, WHartshorn]. b. Delete intermediate revisions of a sequence of changes that is no longer interesting in detail, leaving only a few significant revisions [THarning]. c. Delete old revisions of large files that are routinely committed but only required for a short while (one or a few revisions) [BTutt, BWebster, ABarbati]. d. Splitting the repository - moving certain projects, branches, tags, etc. to another repository [KBenton]. ON-LINE/OFF-LINE * client-side (e.g. in "svn") * server-side (e.g. in "svnadmin") "You'd add it to libsvn_ra if you could." [BTutt]. Complexity of re-deltifying and existing WCs referring to invalid "node ids" [BCollins-Sussman]. "Remove the node contents and mark the node 'dead' (a new state)..." [BCibej] "SVN is a client/server app. Why shouldn't I be able to remotely administer it?" [BTutt] "We can & should do it through RA." [GStein] Dump-load approach: repos down-time, and slow. Client-side approach "is a security concern" [SDaugherty]. Should be client-side because a system's "admin cost" is a significant concern [TRupert]. METADATA (REV-NUMS AND REV-PROPS) "Leaving a log entry for the obliteration should be at the obliterator's discretion; without a log entry, all trace of the path can (should? or yet another option?) be wiped" [TTrias]. I'm not sure if this refers to creating a new rev with a log message, or what. Revision numbers - keep even if the revision is empty, or renumber subsequent revisions [Talden]? EFFECT ON CLIENT; WC INTEGRITY Let the files left in place, but content replaced with an explanatory message [JRobbins]. "Erase all revisions from the faulty one onwards - and let clients sort themselves out manually" [BBucksch]. To detect brokenness, WC could store and compare "time of last obliteration" [JSimlovic]. OPTIONAL FUNCTIONALITY * Dump what's being obliterated [SDaugherty]. WHAT CAN SVNDUMPFILTER DO ALREADY? a. NOT handle moves between excluded and included sections of the repository [JeffC]? DUPLICATE/RELATED ISSUES IN TRACKER #1260 - wanting 'obliterate' for server disk space recovery. #1848 - unrelated, not really a dup. PEOPLE WHO WERE POTENTIALLY OFFERING FUNDING Tim Jackson Brent Webster Auriel Manolson Niels Keurentjes PEOPLE WHO WERE SPECIFYING/DESIGNING Juraj Simlovic Karl Fogel Magnus Torfason SEE ALSO (Gone - <http://svn.collab.net/repos/svn/trunk/contrib/server-side/svn-obliterate.py>.) Blog/rant: <http://blogs.quintor.nl/bbottema/2008/03/01/subversion-obliterate-the-forgotten-feature/> Implementation proposal email (KFogel): <http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=137319> Functional spec in trunk: <http://svn.collab.net/repos/svn/trunk/notes/obliterate/obliterate-functional-spec.txt>. Design email (MTorfason): <http://www.nabble.com/svn-obliterate:-The-four-types-of-obliteration-td22229950.html> Investigative email (MTorfason): <http://svn.haxx.se/dev/archive-2009-02/0642.shtml>
          Hide
          paul Paul Hammant added a comment -

          I get that fsfs does not make it easy to actually delete apparent directories on the server side, given the 
          nature of the revs folder and the deltas within it.
          
          However,  one thing that could work for some people is the apparent obliteration of the folder.  Meaning, 
          the svn client (checkout, up, log, pl, etc) sees no directory in that location.  On the server side it is still 
          there, but some new property marks it as "hidden from all clients".
          

          Show
          paul Paul Hammant added a comment - I get that fsfs does not make it easy to actually delete apparent directories on the server side, given the nature of the revs folder and the deltas within it. However, one thing that could work for some people is the apparent obliteration of the folder. Meaning, the svn client (checkout, up, log, pl, etc) sees no directory in that location. On the server side it is still there, but some new property marks it as "hidden from all clients".
          Hide
          julianfoad Julian Foad added a comment -

          Please keep discussion on the dev@ mailing list. Thanks.
          

          Show
          julianfoad Julian Foad added a comment - Please keep discussion on the dev@ mailing list. Thanks.
          Hide
          julianfoad Julian Foad added a comment -

          Status update.
          
          I did some design
          <http://svn.apache.org/repos/asf/subversion/trunk/notes/obliterate/> and coding
          at the end of 2009 which resulted in an early prototype of an "svn obliterate"
          command.  It was able to discard a file from the latest revision in a BDB
          repository, but not from a non-head revision nor in a FSFS repository.  The
          exercise revealed issues, such as a difficulty in scaling to real-world
          repository sizes, that suggest a more feasible approach would be less general
          and more focused on addressing one specific use case at a time.
          
          Since then the work has been on hold due to higher priorities, particularly the
          need to complete WC-NG before we can release any new version at all.  To avoid
          cluttering the code base for the 1.7 release, the code has been removed for the
          time being (r1091717), and when the work is resumed we can resurrect as little
          or as much of it as is useful.
          
          See also this email about the status
          <http://svn.haxx.se/dev/archive-2010-08/0457.shtml> (from me, 2010-08-18, "Re:
          1.7 and obliterate") and this milestone chart
          <http://svn.apache.org/repos/asf/subversion/trunk/notes/obliterate/plan-milestones.html>.
          

          Show
          julianfoad Julian Foad added a comment - Status update. I did some design <http://svn.apache.org/repos/asf/subversion/trunk/notes/obliterate/> and coding at the end of 2009 which resulted in an early prototype of an "svn obliterate" command. It was able to discard a file from the latest revision in a BDB repository, but not from a non-head revision nor in a FSFS repository. The exercise revealed issues, such as a difficulty in scaling to real-world repository sizes, that suggest a more feasible approach would be less general and more focused on addressing one specific use case at a time. Since then the work has been on hold due to higher priorities, particularly the need to complete WC-NG before we can release any new version at all. To avoid cluttering the code base for the 1.7 release, the code has been removed for the time being (r1091717), and when the work is resumed we can resurrect as little or as much of it as is useful. See also this email about the status <http://svn.haxx.se/dev/archive-2010-08/0457.shtml> (from me, 2010-08-18, "Re: 1.7 and obliterate") and this milestone chart <http://svn.apache.org/repos/asf/subversion/trunk/notes/obliterate/plan-milestones.html>.
          Hide
          subversion-importer Subversion Importer added a comment -

          for the ones needing a quick solution, git (http://git.or.cz) seems to provides
          it in a quite nice way:
          * http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/
          * http://kerneltrap.org/mailarchive/git/2007/10/9/333020
          
          
          

          Original comment by thurnerrupert

          Show
          subversion-importer Subversion Importer added a comment - for the ones needing a quick solution, git (http://git.or.cz) seems to provides it in a quite nice way: * http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/ * http://kerneltrap.org/mailarchive/git/2007/10/9/333020 Original comment by thurnerrupert
          Hide
          subversion-importer Subversion Importer added a comment -

          +1 we need this command in our environment.  Our IT department is quite stingy
          with disk space.
          

          Original comment by bbrooks

          Show
          subversion-importer Subversion Importer added a comment - +1 we need this command in our environment. Our IT department is quite stingy with disk space. Original comment by bbrooks
          Hide
          bbottema Benny Bottema added a comment -

          Just putting it out there that the blog post has moved to: 
          http://blog.projectnibble.org/2008/03/01/subversion-obliterate-the-forgotten-
          feature/
          

          Show
          bbottema Benny Bottema added a comment - Just putting it out there that the blog post has moved to: http://blog.projectnibble.org/2008/03/01/subversion-obliterate-the-forgotten- feature/
          Hide
          subversion-importer Subversion Importer added a comment -

          Quite interested in that feature: our users keep committing .iso and other
          enormous unused files for every kind of errors
          

          Original comment by alex71

          Show
          subversion-importer Subversion Importer added a comment - Quite interested in that feature: our users keep committing .iso and other enormous unused files for every kind of errors Original comment by alex71
          Hide
          subversion-importer Subversion Importer added a comment -

          I would like this feature: If I commit a document or confidential document and
          want to remove it, I wouldn't like anybody retrieve it from history.
          
          If doing this is 'impossible' or 'very hard' make a option to mark the file or
          folder at adding time (first commit) to keep the file or folder at HEAD REVISION
          only (not make history at each commit), so when we delete it it will be gone
          forever.
          

          Original comment by lazyleecher

          Show
          subversion-importer Subversion Importer added a comment - I would like this feature: If I commit a document or confidential document and want to remove it, I wouldn't like anybody retrieve it from history. If doing this is 'impossible' or 'very hard' make a option to mark the file or folder at adding time (first commit) to keep the file or folder at HEAD REVISION only (not make history at each commit), so when we delete it it will be gone forever. Original comment by lazyleecher
          Hide
          xentrax Vyacheslav Lanovets added a comment -

          Hi,

          Just in case somebody ever considers the feature, there are several use cases I can dream up here:
          1. svnadmin should be able to purge everything related to entities that were deleted and not used since x time ago (before some revision).
          2. end users can mark some already deleted entity for purging. But svnadmin must be used to actually purge the marked data and all amend all related revisions.
          3. end users can mark for merging a range of revisions for some entity (like rollback addition of 1GB of data to existing file).

          Regarding why this is needed in our case. We had sourcesafe repository updated to SVN (which tried to guess some revisions). Some commits date back to 199x. As the original repo contained several projects, the resulting repo was copied to several repositories; unrelated folders were deleted with plans for individual security via Active Directory. Over time root directories in the repositories were reorganized to reflect growing product needs. Later on, the security was deemed not useful compared to lost history when transferring files between repositories. And several repositories were combined back to one. And after that the resulting directory structure was reorganized once more to accommodate multiple projects, libraries, tools, sdks. As a consequence of so many moves and reorganizations and copying, is that repository which maybe had 100000 interesting commits now has 250K revisions. As we quickly stopped committing big binaries, it just less than 30G.

          Show
          xentrax Vyacheslav Lanovets added a comment - Hi, Just in case somebody ever considers the feature, there are several use cases I can dream up here: 1. svnadmin should be able to purge everything related to entities that were deleted and not used since x time ago (before some revision). 2. end users can mark some already deleted entity for purging. But svnadmin must be used to actually purge the marked data and all amend all related revisions. 3. end users can mark for merging a range of revisions for some entity (like rollback addition of 1GB of data to existing file). Regarding why this is needed in our case. We had sourcesafe repository updated to SVN (which tried to guess some revisions). Some commits date back to 199x. As the original repo contained several projects, the resulting repo was copied to several repositories; unrelated folders were deleted with plans for individual security via Active Directory. Over time root directories in the repositories were reorganized to reflect growing product needs. Later on, the security was deemed not useful compared to lost history when transferring files between repositories. And several repositories were combined back to one. And after that the resulting directory structure was reorganized once more to accommodate multiple projects, libraries, tools, sdks. As a consequence of so many moves and reorganizations and copying, is that repository which maybe had 100000 interesting commits now has 250K revisions. As we quickly stopped committing big binaries, it just less than 30G.
          Hide
          triadiktyo Doros Agathangelou added a comment -

          I would like to draw the attention of the community to a tool we developed that solves the problem of obliterating files from svn repositories. The tool is called Subdivision and while being a commercial offering, it is free for small repositories and open source projects.

          Subdivision starts off by analyzing the repository to identify the moves and copies of the repository's files and folder. As the user makes the selection of which files to obliterate, Subdivision will select aditional files that have to be obliterated (or preserved) depending on the move and copy history. This removes all the problems that might occur when using svndumpfilter such as the "Invalid copy source path" error.

          In addition to the obliterate functionality, Subdivision can split a repository in two parts while guaranteeing that all files will make it in at least one of the resulting repositories, as well as extract files files into a new repository.

          Show
          triadiktyo Doros Agathangelou added a comment - I would like to draw the attention of the community to a tool we developed that solves the problem of obliterating files from svn repositories. The tool is called Subdivision and while being a commercial offering, it is free for small repositories and open source projects. Subdivision starts off by analyzing the repository to identify the moves and copies of the repository's files and folder. As the user makes the selection of which files to obliterate, Subdivision will select aditional files that have to be obliterated (or preserved) depending on the move and copy history. This removes all the problems that might occur when using svndumpfilter such as the "Invalid copy source path" error. In addition to the obliterate functionality, Subdivision can split a repository in two parts while guaranteeing that all files will make it in at least one of the resulting repositories, as well as extract files files into a new repository.

            People

            • Assignee:
              Unassigned
              Reporter:
              sussman Ben Collins-Sussman
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development