Description
See link for discussion. When using branches it often happens that identical changes are done to copied files; this results in wasted storage space. Using eg. the MD5-hash as an index it should be possible to find such duplicates and, instead of storing a new delta or even fulltext, just saving the other "inode" in the repository (simplest case is [filename,revision], better some internal pointer for speed reasons). Con: FSFS cannot be append-only; the indizes have to be written and re-written. Furthermore I'd like that to be more a cache, so that it can be generated, deleted and regenerated at any time (at a very high speed, as every file has a MD5 archived). For FSFS I'd suggest making a new directory, which uses 2 indirection layers down the hierarchy. Eg. for a file with MD5 of 8a04f87ad04f4a1d3c7e6ca12e07290d repository/ dav/ ... db/ revs/ revprops/ transactions/ md5index/ 8a/ 04/ f87a.index If this index has more than say 256 entries (which should be sorted in the file), it would be possible to split the file into new 16 parts. I believe that could save a lot of space, especially for scenarios with many branches.
http://marc.theaimsgroup.com/?l=subversion-dev&m=111319801911398&w=2
Original issue reported by pmarek