Uploaded image for project: 'Subversion'
  1. Subversion
  2. SVN-4351

improve automatic merge performance

    XMLWordPrintableJSON

Details

    Description

      The fix for issue #4329 and issue #4258 (made in r1464102) makes the
      symmetric/automatic merge algorithm more tolerant of cherry picks and subtree
      merges (the changes this made to the algorithm can be seen here:
      http://wiki.apache.org/subversion/SymmetricMerge?action=diff&rev1=96&rev2=97). 
      However this correctness comes with a performance price as detailed below:
      
      If we don't specify any merge ranges, svn_client_merge_peg5() needs to decide if
      we want (in 1.7 parlance) a "sync style" or "reintegrate style" merge (I'm not
      sure how useful this terminology is going forward, but the code in trunk is
      still largely the same as 1.7, so it still has some relevance).
      
      Let's look at the current automatic merge algorithm
      (http://wiki.apache.org/subversion/SymmetricMerge):
      
        This algorithm provides a symmetric merge with skipping of
        source revisions that have already been cherry-picked to
        the target branch.
      
          1) Find the latest rev of A synced to B and of B synced to A.
          2) Choose a base.
          3) Identify cherry-picks.
          4) Break into 3-way merges, skipping the cherry-picks.
          5) Perform the 3-way merges and mergeinfo addition. 
      
      What specifically are we looking for?  Again from the wiki:
      
        1) Find the latest revision of A synced to B and the latest revision
           of B synced to A.
      
           * This means, for the A to B direction, the youngest revision
             'rN' on A at which all revisions of A, up to and including rN,
             have effectively been merged into B. "Effectively...merged"
             means that one of the following is true for every revision X
             on A, where X <= rN:
      
               a) The merge of rX from A -> B is recorded in the mergeinfo on
                  the version of B that is being used as the merge target.
                  Usually the result corresponds to the most recent merge into
                  B from A, but not necessarily, because later revisions from
                  A might previously have been cherry-picked into B.
      
               b) The merge of rX from Subtree-of-A -> Subtree-of-B is recorded
                  in the subtree mergeinfo on the version of B that is being
                  used as the merge target *and* rX is operative only within
                  Subtree-of-A (i.e. the subtree merge of rX, if repeated at
                  the root of A <-> B would be inoperative except for mergeinfo
                  changes).
      
               c) rX is inoperative on A. 
      
           * We consider only direct A <-> B mergeinfo (including subtree
             mergeinfo under A or B). (Consideration of a merge graph extending
             to other branches is future work. Since we record mergeinfo
             transitively, considering only direct A<->B mergeinfo already
             suffices for some 3-branch scenarios too, but not all such
             scenarios.) 
      
      So how do we do this currently:
      
      Let's start with libsvn_client/merge.c:find_automatic_merge() where where the
      work to solve step #1 in the algorithm above happens:
      
        /* Find the latest revision of A synced to B and the latest
         * revision of B synced to A.
         *
         *   base_on_source = youngest_complete_synced_point(source, target)
         *   base_on_target = youngest_complete_synced_point(target, source)
         */
        SVN_ERR(find_base_on_source(&base_on_source, s_t,
                                    ctx, scratch_pool, scratch_pool));
        SVN_ERR(find_base_on_target(&base_on_target, s_t,
                                    ctx, scratch_pool, scratch_pool));
                                    
      find_base_on_[source|target] are both just wrappers around
      find_last_merged_location() where the interesting stuff happens.  We know the
      following:
      
                /---A---A_TIP@A-PEG-REV>
               /    ^
              /     |
             /   Possible merges of all flavors
      YCA----    (syncs, cherrypicks, subtree)
             \   or maybe *no* merges
              \     |
               \    V
                \---B---B_TIP>
      
      So how do we find the latest revision of A synced to B (with all the caveats
      mentioned above)?  Sure, we could just look at the mergeinfo on the root of B,
      that's quick, but overly naive since it doesn't take subtree merges and/or
      cherrypicks from A to B into consideration (see
      http://subversion.tigris.org/issues/show_bug.cgi?id=4329,
      http://svn.haxx.se/dev/archive-2013-02/0459.shtml, and
      http://subversion.tigris.org/issues/show_bug.cgi?id=4258 as to why).
      
      Turns out that svn_client_mergeinfo_log2() does almost exactly what we need:
      Given two branches, A and B and a revision range, that API can return (via a
      svn_log_entry_receiver_t callback) every fully|partially merged|elgibile
      revision from A->B.  Even better, it can optionally consider all the subtree
      merges from A->B.
      
      So that's the API we use.  First we find the youngest revision merged from A->B:
      
      svn_client_mergeinfo_log2(finding_merged=TRUE,
                                target_path_or_url=B_URL,
                                target_peg_revision=B_TIP,
                                source_path_or_url=A_URL,
                                source_peg_revision=A_PEG_REV,
                                source_start_revision=A_PEG_REV,
                                source_end_revision=YCA,
                                ...
                                depth=svn_depth_infinity /* consider subtree merges */
                                ...);
      
      Yes, we ask for the callback to be run "backwards" from the tip of A to the YCA.
       We do this since we want the youngest merged rev.  Once we find it the receiver
      raises a SVN_ERR_CEASE_INVOCATION, so there is only a single callback made.
      
      If no merged revs are found, then YCA latest revision of A "synced" to B and
      we're done.  Assuming a merged rev is found, then we aren't done.  The first
      call to svn_client_mergeinfo_log2 only provided us with the oldest merged
      revision 'N'.  We don't know if the revisions in the range YCA <= ? < N are also
      merged, noops, or eligible:
      
      
                 /---A---A_TIP@A-PEG-REV>
                /    |
               /  All revs between YCA and A-PEG-REV
              /   fall into one of these 3 categories:
             /   |------|----|--------|
      YCA----    |merged|noop|eligible|
             \   |------|----|--------|
              \     |
               \    V
                \---B---B_TIP>
      
      If the revisions are all merged or noops then 'N' is the youngest fully synced
      revision, but if *any* are eligible, then this is not the case.  So we need to
      call svn_client_mergeinfo_log2 a second time.  This time we ask for the oldest
      eligible revision 'M':
      
      svn_client_mergeinfo_log2(finding_merged=FALSE, /* Find eligible*/
                                target_path_or_url=B_URL,
                                target_peg_revision=B_TIP,
                                source_path_or_url=A_URL,
                                source_peg_revision=A_PEG_REV,
                                source_start_revision=YCA,
                                source_end_revision=M, /* Limit to older then 'M' */
                                ...
                                depth=svn_depth_infinity /* consider subtree merges */
                                ...);
      
      If 'M' is found, then M-1 is the latest revision of A synced to B, otherwise it
      is N.
      
      ~~~~~
      
      The problem of course, is that each call to svn_client_mergeinfo_log2() makes a
      call to svn_client_log5() and in the worst case we can have 4 such calls (2 to
      find the base on A->B and 2 to find the base on B->A).
      
      These roundtrips can be time consuming and they occur at the start of the merge
      when, as far as the user can see, nothing has happened yet.
      
      The following are some examples from our own repository.  The elapsed times
      shown are from the start of the merge until merge.c:find_automatic_merge() finds
      the base on the source and the target, i.e. breaking here:
      
      C:\SVN\src-trunk-4>svn diff
      Index: subversion/libsvn_client/merge.c
      ===================================================================
      --- subversion/libsvn_client/merge.c    (revision 1464101)
      +++ subversion/libsvn_client/merge.c    (working copy)
      @@ -12219,7 +12219,7 @@
                                     ctx, scratch_pool, scratch_pool));
         SVN_ERR(find_base_on_target(&base_on_target, s_t,
                                     ctx, scratch_pool, scratch_pool));
      -
      +return svn_error_create(SVN_ERR_CEASE_INVOCATION, NULL, NULL);
         /* Choose a base. */
         if (base_on_source->rev >= base_on_target->rev)
           {
      
      r1464101 is a recent trunk build immediately before the issue #4329 fix which
      replaced the naive reads of mergeinfo on the roots of the branch with the
      correct (but slower) use of svn_client_mergeinfo_log2() described above.
      
      SYNC STYLE MERGES:
      ------------------
      
      TARGET
      ^/subversion/branches/fsfs-format7@1467411
      MERGE
      svn merge ^/subversion/trunk
      CLIENT trunk@1464101(just prior to issue #4329 fix)
      TimeThis :  Elapsed Time :  00:00:02.932
      TimeThis :  Elapsed Time :  00:00:02.932
      TimeThis :  Elapsed Time :  00:00:02.979
      CLIENT trunk@1467329
      TimeThis :  Elapsed Time :  00:00:09.063
      TimeThis :  Elapsed Time :  00:00:08.767
      TimeThis :  Elapsed Time :  00:00:08.798
      
      TARGET
      ^/subversion/branches/ignore-mergeinfo@1467411
      MERGE
      svn merge ^/subversion/trunk
      CLIENT trunk@1464101(just prior to issue #4329 fix)
      TimeThis :  Elapsed Time :  00:00:02.979
      TimeThis :  Elapsed Time :  00:00:02.979
      TimeThis :  Elapsed Time :  00:00:02.995
      CLIENT trunk@1467329
      TimeThis :  Elapsed Time :  00:00:09.484
      TimeThis :  Elapsed Time :  00:00:08.611
      TimeThis :  Elapsed Time :  00:00:08.626
      
      TARGET
      ^/subversion/branches/ev2-export@1467411
      MERGE
      svn merge ^/subversion/trunk
      CLIENT trunk@1464101 (just prior to issue #4329 fix)
      TimeThis :  Elapsed Time :  00:00:02.964
      TimeThis :  Elapsed Time :  00:00:02.948
      TimeThis :  Elapsed Time :  00:00:02.964
      CLINT trunk@1467329
      TimeThis :  Elapsed Time :  00:00:21.996
      TimeThis :  Elapsed Time :  00:00:20.916
      TimeThis :  Elapsed Time :  00:00:20.826
      
      The significantly slower times of this last merge are likely attributable to the
      fact that some changes have been cherry picked from
      ^/subversion/branches/ev2-export to ^/subversion/trunk.  This means that there
      is no early exit in find_last_merged_location() after the first call to
      svn_client_mergeinfo_log2() to find the youngest revision merged from the
      ev2-export branch to trunk.  In the other two cases, there have only been sync
      merges from trunk to the branch, so there is one less svn_client_mergeinfo_log2
      call. 
      
      REINTEGRATE STYLE MERGES:
      -------------------------
      
      TARGET
      ^/subversion/trunk@1467411
      MERGE
      svn merge ^/subversion/branches/fsfs-format7
      CLIENT trunk@1464101(just prior to issue #4329 fix)
      TimeThis :  Elapsed Time :  00:00:02.995
      TimeThis :  Elapsed Time :  00:00:02.948
      TimeThis :  Elapsed Time :  00:00:02.980
      CLIENT trunk@1467329
      TimeThis :  Elapsed Time :  00:00:08.860
      TimeThis :  Elapsed Time :  00:00:08.782
      TimeThis :  Elapsed Time :  00:00:08.736
      
      TARGET
      ^/subversion/trunk@1467411
      MERGE
      svn merge ^/subversion/branches/ev2-export
      CLIENT trunk@1464101(just prior to issue #4329 fix)
      TimeThis :  Elapsed Time :  00:00:02.964
      TimeThis :  Elapsed Time :  00:00:02.979
      TimeThis :  Elapsed Time :  00:00:02.886
      CLIENT trunk@1467329
      TimeThis :  Elapsed Time :  00:00:21.418
      TimeThis :  Elapsed Time :  00:00:21.325
      TimeThis :  Elapsed Time :  00:00:20.560
      
      TARGET
      ^/subversion/trunk@1467411
      MERGE
      svn merge ^/subversion/branches/ev2-export
      CLIENT trunk@1464101(just prior to issue #4329 fix)
      TimeThis :  Elapsed Time :  00:00:02.995
      TimeThis :  Elapsed Time :  00:00:02.948
      TimeThis :  Elapsed Time :  00:00:02.948
      CLIENT trunk@1467329
      TimeThis :  Elapsed Time :  00:00:09.001
      TimeThis :  Elapsed Time :  00:00:08.704
      TimeThis :  Elapsed Time :  00:00:08.673
      
      ~~~~~
      
      So how to improve performance.  A few potential ideas:
      
      1) A client side log cache
      
      2) A new RA API to push the calculation of the base on the source branch and the
      base on the target branch to the server.
      
      3) Are there trivial cases we can optimize for?  e.g. no subtree mergeinfo
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            pburba Paul Burba
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: