As described on the dev@ list[1], FSFS seems to have a bug in its management of
the serialized hash files it uses to track repository locks.
Until recently, the 'structure' document and coding technique employed indicated
that the intent was to mirror the structure of the versioned filesystem, with
datafiles that carried information about locks (if they represented versioned
files) and/or pointers to other datafiles which represent immediate children in
the directory case. The datafiles themselves employ Subversion's serialized
hash format, and have the nifty property of being named on disk by the MD5
digest of the versioned path they represent (and with some directory sharding in
place, too).
Thus the versioned path "/" uses an on-disk datafile named
"REPO/db/locks/666/6666cd76f96956469e7be39d750cc7d9", and that file (if present)
would contain pointers to other files representing the children of "/" in the
repository (perhaps "/trunk", "/branches", "/tags", "/README", etc.), but only
for those children under which some path was currently locked.
To answer the question, "Is path FOO locked?", you would need only to take the
MD5 digest of the path FOO, and look for the appropriate datafile. If you
didn't find one, path FOO was not locked in that repository. If you found one,
you parsed its contents for lock information (to ensure that the lock hadn't
expired, etc.)
To answer the question, "What paths under FOO are locked?", you would again look
at the datafile named by the MD5 digest of the path FOO, read it's list of
children, open their datafiles (some of which contained lock information, some
of which contained only pointers to their lock-enveloping immediate children),
and repeat until you'd crawled the whole virtual tree of locks.
FSFS's lock removal code jives with that documented algorithm.
But FSFS's lock storage code does not. Assume the scenario where the path
/trunk/src/file.c is locked. The FSFS code today correctly stores in the digest
file for /trunk/src a pointer to the digest file for /trunk/src/file.c (which
contains the lock info). But instead of storing in the digest file for /trunk a
pointer to /trunk/src's digest file, it stores another pointer to the digest
file for /trunk/src/file.c. The same pointer is also stored in the digest file
for the root directory.
In other words, if you lock a file 10 directory levels deep, there will be a
digest file for that path, plus 10 digest files (one for each of the parent
directories) which point directly to that digest file.
The benefit of this bug is that answer the question "What paths under FOO are
locked?" could potentially now be faster than even originally planned, because
every path locked under FOO is pointed to directly from FOO's digest file. You
read 'em all in one swoop, and then consult the individual digests for lock
details. No intermediate path traversal.
The downside is that because the lock deletion code and the lock creation code
don't speak the same algorithm, the lock deletion code leaves dangling
references to once-locked files littered all over those files' parent
directories. This means that if a directory FOO has 10,000 children and
grandchildren that have each ever been locked at some point or another,
answering the question "What files under FOO are locked?" now requires examining
10,000 dangling references.
The lock storage and lock removal algorithms need to match. Which way is best?
That I don't know.
[1] http://svn.haxx.se/dev/archive-2010-06/0333.shtml