Lucene - Core
  1. Lucene - Core
  2. LUCENE-5895

Add per-segment and per-commit id to help replication

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      It would be useful if Lucene recorded a unique id for each segment written and each commit point. This way, file-based replicators can use this to "know" whether the segment/commit they are looking at on a source machine and dest machine are in fact that same.

      I know this would have been very useful when I was playing with NRT replication (LUCENE-5438).

      1. LUCENE-5895.patch
        25 kB
        Michael McCandless
      2. LUCENE-5895.patch
        19 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Initial patch ...

        Show
        Michael McCandless added a comment - Initial patch ...
        Show
        Uwe Schindler added a comment - - edited Why not use Java's UUID class? http://docs.oracle.com/javase/7/docs/api/java/util/UUID.html - especially: http://docs.oracle.com/javase/7/docs/api/java/util/UUID.html#randomUUID()
        Hide
        Michael McCandless added a comment -

        I looked at UUID, and used UUID.randomUUID() in early iterations at first,
        but then decided it's a poor fit here:

        • It tries to be crypographically secure, which is overkill for us:
          we don't care if someone can guess what the next segment id will
          be.
        • It uses SecureRandom to pull randomness, which on Linux can easily
          lead to long hangs (I saw ~10 second hangs) when there's not
          enough entropy being harvested.
        • It loses a few (6 of 128) bits to version/variant, which weakens
          it a bit for our use case. In particular, it's not clear how that
          impacts the period, where with simple ++ (mod 2^128) the period is
          clearly full.
        Show
        Michael McCandless added a comment - I looked at UUID, and used UUID.randomUUID() in early iterations at first, but then decided it's a poor fit here: It tries to be crypographically secure, which is overkill for us: we don't care if someone can guess what the next segment id will be. It uses SecureRandom to pull randomness, which on Linux can easily lead to long hangs (I saw ~10 second hangs) when there's not enough entropy being harvested. It loses a few (6 of 128) bits to version/variant, which weakens it a bit for our use case. In particular, it's not clear how that impacts the period, where with simple ++ (mod 2^128) the period is clearly full.
        Hide
        Michael McCandless added a comment -

        New patch: I hit a failure in the test I added when it got an older codec that doesn't write the id, and fixed SegmentInfo ctor to not generate its own id (that's too dangerous). I also added a comment explaining why UUID isn't a good match for our use ...

        Show
        Michael McCandless added a comment - New patch: I hit a failure in the test I added when it got an older codec that doesn't write the id, and fixed SegmentInfo ctor to not generate its own id (that's too dangerous). I also added a comment explaining why UUID isn't a good match for our use ...
        Hide
        Michael McCandless added a comment -

        I think the last patch is ready ... I think it's net/net low risk, since it just adds a new field to SIS/SI, so I'd like to commit for 4.10. We can always improve the uniqueness of the id generation later (it's opaque).

        Show
        Michael McCandless added a comment - I think the last patch is ready ... I think it's net/net low risk, since it just adds a new field to SIS/SI, so I'd like to commit for 4.10. We can always improve the uniqueness of the id generation later (it's opaque).
        Hide
        Uwe Schindler added a comment -

        I am fine with the patch.

        Show
        Uwe Schindler added a comment - I am fine with the patch.
        Hide
        Michael McCandless added a comment -

        Since we are so close to releasing 4.10, I think this change should wait for the next release ...

        Show
        Michael McCandless added a comment - Since we are so close to releasing 4.10, I think this change should wait for the next release ...
        Hide
        ASF subversion and git services added a comment -

        Commit 1619624 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1619624 ]

        LUCENE-5895: fix version in javadocs

        Show
        ASF subversion and git services added a comment - Commit 1619624 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1619624 ] LUCENE-5895 : fix version in javadocs
        Hide
        ASF subversion and git services added a comment -

        Commit 1622351 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1622351 ]

        LUCENE-5895: fix bug (when 2nd long from xorshift was negative) causing ids to be prefixed by 16 0s

        Show
        ASF subversion and git services added a comment - Commit 1622351 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1622351 ] LUCENE-5895 : fix bug (when 2nd long from xorshift was negative) causing ids to be prefixed by 16 0s
        Hide
        ASF subversion and git services added a comment -

        Commit 1622352 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1622352 ]

        LUCENE-5895: fix bug (when 2nd long from xorshift was negative) causing ids to be prefixed by 16 0s

        Show
        ASF subversion and git services added a comment - Commit 1622352 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1622352 ] LUCENE-5895 : fix bug (when 2nd long from xorshift was negative) causing ids to be prefixed by 16 0s
        Hide
        Shalin Shekhar Mangar added a comment -

        Mike, this works fine for NRT replication but replying on it for recovery would probably cause the entire index to be re-downloaded because local merges on replicas will create segments with a different unique id.

        Show
        Shalin Shekhar Mangar added a comment - Mike, this works fine for NRT replication but replying on it for recovery would probably cause the entire index to be re-downloaded because local merges on replicas will create segments with a different unique id.
        Hide
        Michael McCandless added a comment -

        Hmm, it depends on how merging works b/w primary & replica? E.g. we could merge only on primary and copy merged segments out. Yes, this is added bandwidth, but then merging is also CPU / IO intensive, so saving that work for the replicas may net/net be worthwhile. And copying out a merged segment to a replica can be a lowish priority thing, e.g. it need not gate NRT reopen time (just like how IW's merged segment warmer doesn't either).

        But yeah if the replicas do their own merging, somehow with a merge policy that's guaranteed to precisely match what the primary does (despite e.g. ConcurrentMergeScheduler causing merges to complete in different orders), then we would need to somehow set the IDs of the merged segments on the replicas to match the corresponding merged segments on the primaries. Or, perhaps ... we compute the ID of a merged segment by hashing the IDs of the N segments that were merged ... hmm risky because if there were different deleted docs then the IDs should differ, so maybe we hash on that too ... tricky.

        Show
        Michael McCandless added a comment - Hmm, it depends on how merging works b/w primary & replica? E.g. we could merge only on primary and copy merged segments out. Yes, this is added bandwidth, but then merging is also CPU / IO intensive, so saving that work for the replicas may net/net be worthwhile. And copying out a merged segment to a replica can be a lowish priority thing, e.g. it need not gate NRT reopen time (just like how IW's merged segment warmer doesn't either). But yeah if the replicas do their own merging, somehow with a merge policy that's guaranteed to precisely match what the primary does (despite e.g. ConcurrentMergeScheduler causing merges to complete in different orders), then we would need to somehow set the IDs of the merged segments on the replicas to match the corresponding merged segments on the primaries. Or, perhaps ... we compute the ID of a merged segment by hashing the IDs of the N segments that were merged ... hmm risky because if there were different deleted docs then the IDs should differ, so maybe we hash on that too ... tricky.
        Hide
        ASF subversion and git services added a comment -

        Commit 1627714 from Robert Muir in branch 'dev/branches/lucene5969'
        [ https://svn.apache.org/r1627714 ]

        LUCENE-5969, LUCENE-5895: fix sign bit bugs in segment/commit IDs, use byte[] representation

        Show
        ASF subversion and git services added a comment - Commit 1627714 from Robert Muir in branch 'dev/branches/lucene5969' [ https://svn.apache.org/r1627714 ] LUCENE-5969 , LUCENE-5895 : fix sign bit bugs in segment/commit IDs, use byte[] representation
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development