Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7302

IndexWriter should tell you the order of indexing operations

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: master (7.0), 6.2
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today, when you use multiple threads to concurrently index, Lucene
      knows the effective order that those operations were applied to the
      index, but doesn't return that information back to you.

      But this is important to know, if you want to build a reliable search
      API on top of Lucene. Combined with the recently added NRT
      replication (LUCENE-5438) it can be a strong basis for an efficient
      distributed search API.

      I think we should return this information, since we already have it,
      and since it could simplify servers (ES/Solr) on top of Lucene:

      • They would not require locking preventing the same id from being
        indexed concurrently since they could instead check the returned
        sequence number to know which update "won", for features like
        "realtime get". (Locking is probably still needed for features
        like optimistic concurrency).
      • When re-applying operations from a prior commit point, e.g. on
        recovering after a crash from a transaction log, they can know
        exactly which operations made it into the commit and which did
        not, and replay only the truly missing operations.

      Not returning this just hurts people who try to build servers on top
      with clear semantics on crashing/recovering ... I also struggled with
      this when building a simple "server wrapper" on top of Lucene
      (LUCENE-5376).

      1. LUCENE-7032.patch
        80 kB
        Michael McCandless
      2. LUCENE-7132.patch
        11 kB
        Michael McCandless

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        I've been pushing changes to this branch:

        https://github.com/mikemccand/lucene-solr/tree/sequence_numbers

        I think it's close ... I've resolved all nocommits, and created some
        fun tests with threads updating the same doc at once, doing concurrent
        commits, and verifying what the sequence numbers claim turns out to be
        true.

        The changes are relatively minor: IW already "knows" the order that
        operations were applied, but these methods return void today and
        this changes them to return long instead. Callers who don't
        care can just ignore the returned long.

        It also lets us remove the wrapper class TrackingIndexWriter which
        was doing basically the same thing (returning a long for each op) but
        with weaker guarantees.

        These sequence numbers are fleeting, not saved into commit points,
        etc., and only useful within one IW instance (they reset back to 1 on
        the next IW instance).

        I'll build an applyable patch and post here ...

        Show
        mikemccand Michael McCandless added a comment - I've been pushing changes to this branch: https://github.com/mikemccand/lucene-solr/tree/sequence_numbers I think it's close ... I've resolved all nocommits, and created some fun tests with threads updating the same doc at once, doing concurrent commits, and verifying what the sequence numbers claim turns out to be true. The changes are relatively minor: IW already "knows" the order that operations were applied, but these methods return void today and this changes them to return long instead. Callers who don't care can just ignore the returned long. It also lets us remove the wrapper class TrackingIndexWriter which was doing basically the same thing (returning a long for each op) but with weaker guarantees. These sequence numbers are fleeting, not saved into commit points, etc., and only useful within one IW instance (they reset back to 1 on the next IW instance). I'll build an applyable patch and post here ...
        Hide
        mikemccand Michael McCandless added a comment -

        Here's the applyable patch vs current master from the branch... I think it's close, but I need to improve javadocs.

        Show
        mikemccand Michael McCandless added a comment - Here's the applyable patch vs current master from the branch... I think it's close, but I need to improve javadocs.
        Hide
        mikemccand Michael McCandless added a comment -

        Another iteration, I think it's ready.

        Show
        mikemccand Michael McCandless added a comment - Another iteration, I think it's ready.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit b1fb142af003386f985b4c4ad1a583d009d49e41 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b1fb142 ]

        LUCENE-7302: Merge branch 'sequence_numbers'

        Show
        jira-bot ASF subversion and git services added a comment - Commit b1fb142af003386f985b4c4ad1a583d009d49e41 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b1fb142 ] LUCENE-7302 : Merge branch 'sequence_numbers'
        Hide
        mikemccand Michael McCandless added a comment -

        I'll backport this after 6.1 branch is cut (for 6.2).

        Show
        mikemccand Michael McCandless added a comment - I'll backport this after 6.1 branch is cut (for 6.2).
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 00584579b70041addbd47859012e25e67e079e10 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0058457 ]

        LUCENE-7302: move CHANGES entry to the right section

        Show
        jira-bot ASF subversion and git services added a comment - Commit 00584579b70041addbd47859012e25e67e079e10 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0058457 ] LUCENE-7302 : move CHANGES entry to the right section
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 32c8dfaad5c6d8f79b7d0d7d917db0605f27a9ea in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=32c8dfa ]

        LUCENE-7302: move CHANGES entry to the right place

        Show
        jira-bot ASF subversion and git services added a comment - Commit 32c8dfaad5c6d8f79b7d0d7d917db0605f27a9ea in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=32c8dfa ] LUCENE-7302 : move CHANGES entry to the right place
        Hide
        mikemccand Michael McCandless added a comment -

        I backported for 6.2.

        Show
        mikemccand Michael McCandless added a comment - I backported for 6.2.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 5a0321680fe5e57a17470b824024d5b56a4cbaa4 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5a03216 ]

        LUCENE-7302: ensure IW.getMaxCompletedSequenceNumber only reflects a change after NRT reader refresh would also see it

        Show
        jira-bot ASF subversion and git services added a comment - Commit 5a0321680fe5e57a17470b824024d5b56a4cbaa4 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5a03216 ] LUCENE-7302 : ensure IW.getMaxCompletedSequenceNumber only reflects a change after NRT reader refresh would also see it
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 8ed16fd1f9a03c66d4ac81ddaa7ab70359410b95 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8ed16fd ]

        LUCENE-7302: ensure IW.getMaxCompletedSequenceNumber only reflects a change after NRT reader refresh would also see it

        Show
        jira-bot ASF subversion and git services added a comment - Commit 8ed16fd1f9a03c66d4ac81ddaa7ab70359410b95 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8ed16fd ] LUCENE-7302 : ensure IW.getMaxCompletedSequenceNumber only reflects a change after NRT reader refresh would also see it
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 5a0321680fe5e57a17470b824024d5b56a4cbaa4 in lucene-solr's branch refs/heads/apiv2 from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5a03216 ]

        LUCENE-7302: ensure IW.getMaxCompletedSequenceNumber only reflects a change after NRT reader refresh would also see it

        Show
        jira-bot ASF subversion and git services added a comment - Commit 5a0321680fe5e57a17470b824024d5b56a4cbaa4 in lucene-solr's branch refs/heads/apiv2 from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5a03216 ] LUCENE-7302 : ensure IW.getMaxCompletedSequenceNumber only reflects a change after NRT reader refresh would also see it
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 503da1fcb9fa96c2ba62e9164ee38011b2e23669 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=503da1f ]

        LUCENE-7302: IW.getMaxCompletedSequenceNumber was returning the wrong value after IW.deleteAll

        Show
        jira-bot ASF subversion and git services added a comment - Commit 503da1fcb9fa96c2ba62e9164ee38011b2e23669 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=503da1f ] LUCENE-7302 : IW.getMaxCompletedSequenceNumber was returning the wrong value after IW.deleteAll
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 4ff882e4aa9cb7fc585213bca9344fa05d1bec5f in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4ff882e ]

        LUCENE-7302: IW.getMaxCompletedSequenceNumber was returning the wrong value after IW.deleteAll

        Show
        jira-bot ASF subversion and git services added a comment - Commit 4ff882e4aa9cb7fc585213bca9344fa05d1bec5f in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4ff882e ] LUCENE-7302 : IW.getMaxCompletedSequenceNumber was returning the wrong value after IW.deleteAll
        Hide
        mikemccand Michael McCandless added a comment -

        Bulk close resolved issues after 6.2.0 release.

        Show
        mikemccand Michael McCandless added a comment - Bulk close resolved issues after 6.2.0 release.

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development