As a senior software designer and a junior software architect with some knowldge
of svn, I would like to contribute with my analysis of the request. It is quite
long and detailed, but I hope it is complete. I will start with a problem
analysis and then continue with solution proposal.
Reading through this issue, I came to a conclusion, that it is requested to be
able to "delete specific revisions of specific files/paths, as if they never
happened". This includes removing the entire history of one file/path, or
removing entire revision.
Optionally, it is requested, when deleting all revisions of specific file/path,
to delete all files derived (moved/copied) from this file (e.g. delete all tags
of the file/path). Let's call this cascading.
What happens when a specific revision of a specific file is removed? It means
that the final results must be the same as if the associated commit never
happened. However, all other commits (before or after) did happen. Or in other
words, it means that the removee revision is to be merged with the next revision
of the same file. If we have three revisions, A, B and C, of the same file and B
is removed, then the results will be revision A and revision BC. When
checkouting revision B, we get A, when checkouting C, we get C.
What happens if we remove a revision, where a file is added? It means that the
final results must be the same, as if the file was added later, when its next
revision occur. This happens if we add a file in wc, but forget to commit it. We
edit the file a bit and then commit it.
What happens if we delete a revision, where a file is removed? It means that the
final results must be the same, as if the file was never deleted. However, if in
some later revision a new file is added at the same path, we must then delete
the deleted file and start a brand new file. The old file is replaced by a new
one. Otherwise the history would divert. Note: It is appropriate to delete the
deleted file in the same revision where the new file is started. Why? Because
there is the point, where the deleted file now dies out, because it was not
supposed to be deleted any sooner.
What happens if we delete a range of revisions, where a file is added and then
removed? The file never existed. So the results for any revision inside the
range can be concluded from above: the file was not added until it was removed.
What happned if we delete a revision, where a file is moved/copied? The results
must be the same as if the file was moved/copied later, when its next revision
at the new place occur. This is similar to adding a new file and/or deleting an
What happens to the working copies? They might be invalidated. Some commits did
not occur yet, therefore the files may contain data that are not yet available.
Everything above can be done in two steps.
First step is to mark requested revisions of requested files/paths as
obliterated. This is simple. No data is removed, filesystem is not rewritten.
However, from this point, all checkouts and updates must return the new data
according to the QaA section above, which can be done within current
architecture. Of course, this step does not physically remove any confidential
info and frees no space, but can be done quickly without locking the repository.
When dealing with invalidated working copies, we need to know, that some
revisions were modified in the meantime. In other words, a working copy will be
up-to-date, when the revision number is the same AND when the time of the last
obliteration will be the same. When the time of the last obliteration is
different, we need to update the working copy. This means comparing the
server-side revision with cached working base with modified working copy. This
seems to be a problem, because we have no code for it yet (afaik). Or we can
give it the deep six and force a checkout when it happens. Well.. :))
The second step is to physically remove all redundant data. This means to
compact the filesystem. To rewrite it from BASE to HEAD according to the QaA
section above. This may seem to be difficult, however, if the repository is
locked meanwhile, it can be done with the code currently available (building a
brand new repository from the old one). This step removes any confidential info
and reduces the space taken by the repository.
Note that can be ignored for now: There is no need for a full-lock in the second
step. New commits may actually happen in the meantime. Just the old revisions
may not be modified. So the lock may be progressive as the rewrite progresses.
This leads to two locks. One that locks the HEAD of the repository during a new
comit and one that (progressively) locks revisions from the BASE, while the
rewrite is being done.
Original comment by jsimlo