Uploaded image for project: 'Subversion'
  1. Subversion
  2. SVN-1573

fs commit-deltification doesn't scale




      If you add 5 bytes to a 256 meg file and commit, it takes many
         minutes for the svn_fs_merge() to return success, because it's
         deltifying the previous version of the file against the new
         Because this is happening as a 'builtin' part of a commit, it
         destroys svn's ability to commit changes to large files.  When
         operating over dav, neon times out waiting for the final 'MERGE'
         command to return success.  And for people using ra_svn, it's still
         not acceptable for users to wait many, many minutes for the commit
         to finish.
         The fact that the repository stores non-HEAD versions of files as
         deltas is an optimization (a deliberate space/time tradeoff) and an
         internal implementation.  We shouldn't be punishing users for this.
         We've discussed a few solutions:
           * do nothing.  just tell users to increase neon timeouts to huge
             values when committing changes to large files.  And to twiddle
             their thumbs for a long time.
           * prevent deltification on files over a certain size.  The
             problem with this, of course, is that large files are the very
             ones that actually affect the repository size -- the whole reason
             we're doing deltification at all.
           * prevent deltification on files over a certain size, but create
             some sort of out-of-band compression command -- something like
             'svnadmin deltify/compress/whatever' that a sysadmin or cron
             job can run during non-peak hours to reclaim disk space.
           * make svn_fs_merge() spawn a deltification thread (using APR
             threads) and return success immediately.  If the thread fails
             to deltify, it's not the end of the world: we simply don't get
             the disk-space savings.
           * a better, post-1.0 long-term solution: it takes N minutes for
             the client to figure out the correct vdelta data to send to the
             repository, then takes another N minutes for the repository to
             calculate the *same* delta in reverse!  This is a huge waste of
             time.  It would be nice, someday, to remember the vdelta
             windows and invert them when deltifying after the commit.
             (vdelta *is* invertible if you have both fulltexts, which the
             repository does.)  And it would be very quick, too.




            Unassigned Unassigned
            sussman Ben Collins-Sussman
            0 Vote for this issue
            0 Start watching this issue