[SVN-4351] improve automatic merge performance - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: trunk
Fix Version/s: 1.10-consider
Component/s: unknown
Labels:
- merge-tracking

Description

The fix for issue #4329 and issue #4258 (made in r1464102) makes the
symmetric/automatic merge algorithm more tolerant of cherry picks and subtree
merges (the changes this made to the algorithm can be seen here:
http://wiki.apache.org/subversion/SymmetricMerge?action=diff&rev1=96&rev2=97). 
However this correctness comes with a performance price as detailed below:

If we don't specify any merge ranges, svn_client_merge_peg5() needs to decide if
we want (in 1.7 parlance) a "sync style" or "reintegrate style" merge (I'm not
sure how useful this terminology is going forward, but the code in trunk is
still largely the same as 1.7, so it still has some relevance).

Let's look at the current automatic merge algorithm
(http://wiki.apache.org/subversion/SymmetricMerge):

  This algorithm provides a symmetric merge with skipping of
  source revisions that have already been cherry-picked to
  the target branch.

    1) Find the latest rev of A synced to B and of B synced to A.
    2) Choose a base.
    3) Identify cherry-picks.
    4) Break into 3-way merges, skipping the cherry-picks.
    5) Perform the 3-way merges and mergeinfo addition. 

What specifically are we looking for?  Again from the wiki:

  1) Find the latest revision of A synced to B and the latest revision
     of B synced to A.

     * This means, for the A to B direction, the youngest revision
       'rN' on A at which all revisions of A, up to and including rN,
       have effectively been merged into B. "Effectively...merged"
       means that one of the following is true for every revision X
       on A, where X <= rN:

         a) The merge of rX from A -> B is recorded in the mergeinfo on
            the version of B that is being used as the merge target.
            Usually the result corresponds to the most recent merge into
            B from A, but not necessarily, because later revisions from
            A might previously have been cherry-picked into B.

         b) The merge of rX from Subtree-of-A -> Subtree-of-B is recorded
            in the subtree mergeinfo on the version of B that is being
            used as the merge target *and* rX is operative only within
            Subtree-of-A (i.e. the subtree merge of rX, if repeated at
            the root of A <-> B would be inoperative except for mergeinfo
            changes).

         c) rX is inoperative on A. 

     * We consider only direct A <-> B mergeinfo (including subtree
       mergeinfo under A or B). (Consideration of a merge graph extending
       to other branches is future work. Since we record mergeinfo
       transitively, considering only direct A<->B mergeinfo already
       suffices for some 3-branch scenarios too, but not all such
       scenarios.) 

So how do we do this currently:

Let's start with libsvn_client/merge.c:find_automatic_merge() where where the
work to solve step #1 in the algorithm above happens:

  /* Find the latest revision of A synced to B and the latest
   * revision of B synced to A.
   *
   *   base_on_source = youngest_complete_synced_point(source, target)
   *   base_on_target = youngest_complete_synced_point(target, source)
   */
  SVN_ERR(find_base_on_source(&base_on_source, s_t,
                              ctx, scratch_pool, scratch_pool));
  SVN_ERR(find_base_on_target(&base_on_target, s_t,
                              ctx, scratch_pool, scratch_pool));
                              
find_base_on_[source|target] are both just wrappers around
find_last_merged_location() where the interesting stuff happens.  We know the
following:

          /---A---A_TIP@A-PEG-REV>
         /    ^
        /     |
       /   Possible merges of all flavors
YCA----    (syncs, cherrypicks, subtree)
       \   or maybe *no* merges
        \     |
         \    V
          \---B---B_TIP>

So how do we find the latest revision of A synced to B (with all the caveats
mentioned above)?  Sure, we could just look at the mergeinfo on the root of B,
that's quick, but overly naive since it doesn't take subtree merges and/or
cherrypicks from A to B into consideration (see
http://subversion.tigris.org/issues/show_bug.cgi?id=4329,
http://svn.haxx.se/dev/archive-2013-02/0459.shtml, and
http://subversion.tigris.org/issues/show_bug.cgi?id=4258 as to why).

Turns out that svn_client_mergeinfo_log2() does almost exactly what we need:
Given two branches, A and B and a revision range, that API can return (via a
svn_log_entry_receiver_t callback) every fully|partially merged|elgibile
revision from A->B.  Even better, it can optionally consider all the subtree
merges from A->B.

So that's the API we use.  First we find the youngest revision merged from A->B:

svn_client_mergeinfo_log2(finding_merged=TRUE,
                          target_path_or_url=B_URL,
                          target_peg_revision=B_TIP,
                          source_path_or_url=A_URL,
                          source_peg_revision=A_PEG_REV,
                          source_start_revision=A_PEG_REV,
                          source_end_revision=YCA,
                          ...
                          depth=svn_depth_infinity /* consider subtree merges */
                          ...);

Yes, we ask for the callback to be run "backwards" from the tip of A to the YCA.
 We do this since we want the youngest merged rev.  Once we find it the receiver
raises a SVN_ERR_CEASE_INVOCATION, so there is only a single callback made.

If no merged revs are found, then YCA latest revision of A "synced" to B and
we're done.  Assuming a merged rev is found, then we aren't done.  The first
call to svn_client_mergeinfo_log2 only provided us with the oldest merged
revision 'N'.  We don't know if the revisions in the range YCA <= ? < N are also
merged, noops, or eligible:


           /---A---A_TIP@A-PEG-REV>
          /    |
         /  All revs between YCA and A-PEG-REV
        /   fall into one of these 3 categories:
       /   |------|----|--------|
YCA----    |merged|noop|eligible|
       \   |------|----|--------|
        \     |
         \    V
          \---B---B_TIP>

If the revisions are all merged or noops then 'N' is the youngest fully synced
revision, but if *any* are eligible, then this is not the case.  So we need to
call svn_client_mergeinfo_log2 a second time.  This time we ask for the oldest
eligible revision 'M':

svn_client_mergeinfo_log2(finding_merged=FALSE, /* Find eligible*/
                          target_path_or_url=B_URL,
                          target_peg_revision=B_TIP,
                          source_path_or_url=A_URL,
                          source_peg_revision=A_PEG_REV,
                          source_start_revision=YCA,
                          source_end_revision=M, /* Limit to older then 'M' */
                          ...
                          depth=svn_depth_infinity /* consider subtree merges */
                          ...);

If 'M' is found, then M-1 is the latest revision of A synced to B, otherwise it
is N.

~~~~~

The problem of course, is that each call to svn_client_mergeinfo_log2() makes a
call to svn_client_log5() and in the worst case we can have 4 such calls (2 to
find the base on A->B and 2 to find the base on B->A).

These roundtrips can be time consuming and they occur at the start of the merge
when, as far as the user can see, nothing has happened yet.

The following are some examples from our own repository.  The elapsed times
shown are from the start of the merge until merge.c:find_automatic_merge() finds
the base on the source and the target, i.e. breaking here:

C:\SVN\src-trunk-4>svn diff
Index: subversion/libsvn_client/merge.c
===================================================================
--- subversion/libsvn_client/merge.c    (revision 1464101)
+++ subversion/libsvn_client/merge.c    (working copy)
@@ -12219,7 +12219,7 @@
                               ctx, scratch_pool, scratch_pool));
   SVN_ERR(find_base_on_target(&base_on_target, s_t,
                               ctx, scratch_pool, scratch_pool));
-
+return svn_error_create(SVN_ERR_CEASE_INVOCATION, NULL, NULL);
   /* Choose a base. */
   if (base_on_source->rev >= base_on_target->rev)
     {

r1464101 is a recent trunk build immediately before the issue #4329 fix which
replaced the naive reads of mergeinfo on the roots of the branch with the
correct (but slower) use of svn_client_mergeinfo_log2() described above.

SYNC STYLE MERGES:
------------------

TARGET
^/subversion/branches/fsfs-format7@1467411
MERGE
svn merge ^/subversion/trunk
CLIENT trunk@1464101(just prior to issue #4329 fix)
TimeThis :  Elapsed Time :  00:00:02.932
TimeThis :  Elapsed Time :  00:00:02.932
TimeThis :  Elapsed Time :  00:00:02.979
CLIENT trunk@1467329
TimeThis :  Elapsed Time :  00:00:09.063
TimeThis :  Elapsed Time :  00:00:08.767
TimeThis :  Elapsed Time :  00:00:08.798

TARGET
^/subversion/branches/ignore-mergeinfo@1467411
MERGE
svn merge ^/subversion/trunk
CLIENT trunk@1464101(just prior to issue #4329 fix)
TimeThis :  Elapsed Time :  00:00:02.979
TimeThis :  Elapsed Time :  00:00:02.979
TimeThis :  Elapsed Time :  00:00:02.995
CLIENT trunk@1467329
TimeThis :  Elapsed Time :  00:00:09.484
TimeThis :  Elapsed Time :  00:00:08.611
TimeThis :  Elapsed Time :  00:00:08.626

TARGET
^/subversion/branches/ev2-export@1467411
MERGE
svn merge ^/subversion/trunk
CLIENT trunk@1464101 (just prior to issue #4329 fix)
TimeThis :  Elapsed Time :  00:00:02.964
TimeThis :  Elapsed Time :  00:00:02.948
TimeThis :  Elapsed Time :  00:00:02.964
CLINT trunk@1467329
TimeThis :  Elapsed Time :  00:00:21.996
TimeThis :  Elapsed Time :  00:00:20.916
TimeThis :  Elapsed Time :  00:00:20.826

The significantly slower times of this last merge are likely attributable to the
fact that some changes have been cherry picked from
^/subversion/branches/ev2-export to ^/subversion/trunk.  This means that there
is no early exit in find_last_merged_location() after the first call to
svn_client_mergeinfo_log2() to find the youngest revision merged from the
ev2-export branch to trunk.  In the other two cases, there have only been sync
merges from trunk to the branch, so there is one less svn_client_mergeinfo_log2
call. 

REINTEGRATE STYLE MERGES:
-------------------------

TARGET
^/subversion/trunk@1467411
MERGE
svn merge ^/subversion/branches/fsfs-format7
CLIENT trunk@1464101(just prior to issue #4329 fix)
TimeThis :  Elapsed Time :  00:00:02.995
TimeThis :  Elapsed Time :  00:00:02.948
TimeThis :  Elapsed Time :  00:00:02.980
CLIENT trunk@1467329
TimeThis :  Elapsed Time :  00:00:08.860
TimeThis :  Elapsed Time :  00:00:08.782
TimeThis :  Elapsed Time :  00:00:08.736

TARGET
^/subversion/trunk@1467411
MERGE
svn merge ^/subversion/branches/ev2-export
CLIENT trunk@1464101(just prior to issue #4329 fix)
TimeThis :  Elapsed Time :  00:00:02.964
TimeThis :  Elapsed Time :  00:00:02.979
TimeThis :  Elapsed Time :  00:00:02.886
CLIENT trunk@1467329
TimeThis :  Elapsed Time :  00:00:21.418
TimeThis :  Elapsed Time :  00:00:21.325
TimeThis :  Elapsed Time :  00:00:20.560

TARGET
^/subversion/trunk@1467411
MERGE
svn merge ^/subversion/branches/ev2-export
CLIENT trunk@1464101(just prior to issue #4329 fix)
TimeThis :  Elapsed Time :  00:00:02.995
TimeThis :  Elapsed Time :  00:00:02.948
TimeThis :  Elapsed Time :  00:00:02.948
CLIENT trunk@1467329
TimeThis :  Elapsed Time :  00:00:09.001
TimeThis :  Elapsed Time :  00:00:08.704
TimeThis :  Elapsed Time :  00:00:08.673

~~~~~

So how to improve performance.  A few potential ideas:

1) A client side log cache

2) A new RA API to push the calculation of the base on the source branch and the
base on the target branch to the server.

3) Are there trivial cases we can optimize for?  e.g. no subtree mergeinfo

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Paul Burba

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 12/Apr/13 20:39

Updated:: 19/Feb/15 14:57