Description
Okey dokey, so I stumbled across (and just fixed) a little bug whereby practically every time we made a mutable copy of a representation, we wrote it twice to the repository. This resulted in no integrity problems that I could see, but as of revision 188, there where 4906 mutable (and presumably unused) representations in our database. I'm pretty sure that there was no strings data duplication in effect, so we're only talking about a relatively small amount of unused data. However. Now is a good time to consider adding a "cleanup" subcommand to svnadmin. Here's what I propose: 1. Cruise through the `revisions' and `transactions' tables, tracking the node-ids of all the root nodes that matter in KEEPERS. 2. For each node-id in KEEPERS, recursively crawl in the `nodes' table the trees rooted at that node-id. Track all nodes we pass through in MOREKEEPERS, all reps associated with those nodes in REPS. 3. For each representation key in REPS, dig around in the `representations' table to determine what strings are associated with that representation key, tracking those items in STRINGS. At this point, all nodes not in KEEPERS or MOREKEEPERS is discardable. All representations not in REPS are discardable. All strings not in STRINGS are discardable. 4. Discard the discardable stuff!
Attachments
Issue Links
- depends upon
-
SVN-573 very long fs node IDs on bottleneck directories
- Closed
-
SVN-648 need FS dump/load format and tool
- Closed
-
SVN-531 undeltification improvements
- Closed
- is blocked by
-
SVN-573 very long fs node IDs on bottleneck directories
- Closed
-
SVN-648 need FS dump/load format and tool
- Closed
-
SVN-531 undeltification improvements
- Closed
(1 is blocked by)