Uploaded image for project: 'Subversion'
  1. Subversion
  2. SVN-4877

FSFS commit failure should release txn proto-rev lock

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.10.x, 1.14.1
    • None
    • libsvn_fs_fs
    • None

    Description

      Email thread: on dev@, **2020-06-10, "FSFS commit failure should release txn proto-rev lock", https://lists.apache.org/thread.html/r649aba731b6b01e90eebf26a5e2ba4ce8e806b4135aa75a0a082f990%40%3Cdev.subversion.apache.org%3E

      Quoting from that thread:

      TL;DR: I propose a change to the FSFS commit-transaction function, to
      release the proto-rev write lock if an error occurs while it has this lock.

      The practical applications of this change are rather obscure, which
      perhaps explains why it has not been needed before. In particular, it
      apparently is not needed for the way the rest of standard Subversion
      drives FSFS, but may be needed for other users of FSFS. I have come
      across this case in WANdisco's replicator, but as there are other
      peculiarities in how that drives FSFS, let us not confuse the issue by
      looking too closely at it. It appears the issue would apply to other
      users of FSFS too.

      In the FSFS commit-transaction code path (in svn_fs_fs__commit) there is
      a region where it acquires an exclusive write lock on the prototype
      revision (proto-rev). There are cases where code in this region can
      fail, and there is no release of the lock in the error return path.
      That means if the calling process re-tries, the "writing" flag is still
      set in the transaction object in memory, and this causes an "already
      locked" error.

      In regular Subversion we abandon a transaction if it fails at this
      stage, and so never hit the problem. There are failure modes where a
      re-try could not succeed, notably after we move the proto-rev file into
      its final location, breaking the transaction; this case is called out in
      comments in the function and will remain after this change. Abandoning
      the transaction is a safe and effective way to use FSFS. However, other
      users of FSFS may prefer to re-try in certain other cases.

      The case WANdisco encountered is where some old repository corruption
      (SVN-4858) was detected and reported at some point in this code region.
      Although the commit would not be able to succeed, it was important to
      them that the same error should be reported again during a re-try, and
      what was observed instead was that the "already locked" error was thrown
      instead.

      I suppose disk being temporarily inaccessible is one class of error
      where a re-try might be successful.

      The attached test and patch demonstrate and fix the problem.

      This patch does not attempt to make it possible to re-try a failed
      commit in all cases. Some remaining cases are noted in the patch log
      message.

      Attachments

        1. test-release-proto-rev-lock-7.tgz
          3 kB
          Julian Foad
        2. svn-release-proto-rev-lock-7.patch
          10 kB
          Julian Foad

        Activity

          People

            Unassigned Unassigned
            julianfoad Julian Foad
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: