Uploaded image for project: 'Subversion'
  1. Subversion
  2. SVN-3660

Consider revamping the FSFS lock management algorithm.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.6.x
    • unscheduled
    • libsvn_fs_fs
    • None

    Description

      As described on the dev@ list[1], FSFS seems to have a bug in its management of
      the serialized hash files it uses to track repository locks.
      
      Until recently, the 'structure' document and coding technique employed indicated
      that the intent was to mirror the structure of the versioned filesystem, with
      datafiles that carried information about locks (if they represented versioned
      files) and/or pointers to other datafiles which represent immediate children in
      the directory case.  The datafiles themselves employ Subversion's serialized
      hash format, and have the nifty property of being named on disk by the MD5
      digest of the versioned path they represent (and with some directory sharding in
      place, too).
      
      Thus the versioned path "/" uses an on-disk datafile named
      "REPO/db/locks/666/6666cd76f96956469e7be39d750cc7d9", and that file (if present)
      would contain pointers to other files representing the children of "/" in the
      repository (perhaps "/trunk", "/branches", "/tags", "/README", etc.), but only
      for those children under which some path was currently locked.
      
      To answer the question, "Is path FOO locked?", you would need only to take the
      MD5 digest of the path FOO, and look for the appropriate datafile.  If you
      didn't find one, path FOO was not locked in that repository.  If you found one,
      you parsed its contents for lock information (to ensure that the lock hadn't
      expired, etc.)
      
      To answer the question, "What paths under FOO are locked?", you would again look
      at the datafile named by the MD5 digest of the path FOO, read it's list of
      children, open their datafiles (some of which contained lock information, some
      of which contained only pointers to their lock-enveloping immediate children),
      and repeat until you'd crawled the whole virtual tree of locks.
      
      FSFS's lock removal code jives with that documented algorithm.
      
      But FSFS's lock storage code does not.  Assume the scenario where the path
      /trunk/src/file.c is locked.  The FSFS code today correctly stores in the digest
      file for /trunk/src a pointer to the digest file for /trunk/src/file.c (which
      contains the lock info).  But instead of storing in the digest file for /trunk a
      pointer to /trunk/src's digest file, it stores another pointer to the digest
      file for /trunk/src/file.c.  The same pointer is also stored in the digest file
      for the root directory.
      
      In other words, if you lock a file 10 directory levels deep, there will be a
      digest file for that path, plus 10 digest files (one for each of the parent
      directories) which point directly to that digest file.
      
      The benefit of this bug is that answer the question "What paths under FOO are
      locked?" could potentially now be faster than even originally planned, because
      every path locked under FOO is pointed to directly from FOO's digest file.  You
      read 'em all in one swoop, and then consult the individual digests for lock
      details.  No intermediate path traversal.
      
      The downside is that because the lock deletion code and the lock creation code
      don't speak the same algorithm, the lock deletion code leaves dangling
      references to once-locked files littered all over those files' parent
      directories.  This means that if a directory FOO has 10,000 children and
      grandchildren that have each ever been locked at some point or another,
      answering the question "What files under FOO are locked?" now requires examining
      10,000 dangling references.
      
      The lock storage and lock removal algorithms need to match.  Which way is best?
       That I don't know.
      
      
      [1] http://svn.haxx.se/dev/archive-2010-06/0333.shtml
      

      Attachments

        1. 8_fsfs-rebuild-lock-storage-area.py
          6 kB
          C. Michael Pilato
        2. 7_issue-3660-with-patch-alt.txt
          6 kB
          C. Michael Pilato
        3. 6_issue-3660-with-patch-1.txt
          7 kB
          C. Michael Pilato
        4. 5_issue-3660-unpatched.txt
          8 kB
          C. Michael Pilato
        5. 4_issue-3660-recipe.sh
          6 kB
          C. Michael Pilato
        6. 3_issue-3660-patch-alt.txt
          1.0 kB
          C. Michael Pilato
        7. 2_issue-3660-patch-1.txt
          4 kB
          C. Michael Pilato
        8. 1_lock-all-unlock-all.sh
          5 kB
          C. Michael Pilato

        Activity

          People

            Unassigned Unassigned
            cmpilato C. Michael Pilato
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: