Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4, Trunk
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc.

      I will upload a patch soon where I'll describe in general how it works.

      1. LUCENE-4975.patch
        294 kB
        Shai Erera
      2. LUCENE-4975.patch
        293 kB
        Shai Erera
      3. LUCENE-4975.patch
        295 kB
        Shai Erera
      4. LUCENE-4975.patch
        291 kB
        Shai Erera
      5. LUCENE-4975.patch
        292 kB
        Shai Erera
      6. LUCENE-4975.patch
        292 kB
        Shai Erera
      7. LUCENE-4975.patch
        291 kB
        Shai Erera
      8. LUCENE-4975.patch
        286 kB
        Shai Erera
      9. LUCENE-4975.patch
        281 kB
        Shai Erera
      10. LUCENE-4975.patch
        276 kB
        Shai Erera
      11. LUCENE-4975.patch
        177 kB
        Shai Erera
      12. LUCENE-4975.patch
        267 kB
        Shai Erera
      13. LUCENE-4975.patch
        269 kB
        Shai Erera

        Activity

        Hide
        Steve Rowe added a comment -

        Bulk close resolved 4.4 issues

        Show
        Steve Rowe added a comment - Bulk close resolved 4.4 issues
        Hide
        Shai Erera added a comment -

        Shyam, checkout this blog post (http://shaierera.blogspot.com/2013/05/the-replicator.html) which explains how the Replicator works and includes some example code. The javadocs also contain example docs. If you run into any issues, don't hesitate to email java-user@lucene.apache.org.

        Show
        Shai Erera added a comment - Shyam, checkout this blog post ( http://shaierera.blogspot.com/2013/05/the-replicator.html ) which explains how the Replicator works and includes some example code. The javadocs also contain example docs. If you run into any issues, don't hesitate to email java-user@lucene.apache.org.
        Hide
        Shyam V S added a comment -

        Shair Erera,
        If I want to try out this feature, how and where should I start? I'm planning to try out in a Master + 2 slaves lucene(integrted with hibernate) setup.

        Show
        Shyam V S added a comment - Shair Erera, If I want to try out this feature, how and where should I start? I'm planning to try out in a Master + 2 slaves lucene(integrted with hibernate) setup.
        Hide
        Shai Erera added a comment -

        Committed to trunk and 4x. Thanks Mike!

        Show
        Shai Erera added a comment - Committed to trunk and 4x. Thanks Mike!
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] shaie
        http://svn.apache.org/viewvc?view=revision&revision=1481804

        LUCENE-4975: Add Replication module to Lucene

        Show
        Commit Tag Bot added a comment - [trunk commit] shaie http://svn.apache.org/viewvc?view=revision&revision=1481804 LUCENE-4975 : Add Replication module to Lucene
        Hide
        Shai Erera added a comment -

        Patch adds instance InfoStream members instead of relying on the static default. Beasted 10K+ iterations for both IndexReplicationClientTest and IndexAndTaxonomyReplicationClientTest, all pass.

        I think it's ready. I plan to commit it tomorrow.

        Show
        Shai Erera added a comment - Patch adds instance InfoStream members instead of relying on the static default. Beasted 10K+ iterations for both IndexReplicationClientTest and IndexAndTaxonomyReplicationClientTest, all pass. I think it's ready. I plan to commit it tomorrow.
        Hide
        Shai Erera added a comment -

        New patch changes how handlers work:

        • Beasting found a seed which uncovered a major problem with their current operation. They were trying to be "too honest" with the index and e.g. revert/delete upon any exception that occurred.
        • Thanks to MDW, Mike and I decided to keep the handlers simple – if a handler successfully copies + syncs the revision files, then this is considered the "new revision".
        • Kissing the index is now done not through IndexWriter, but rather deleting all files not referenced by last commit.
          • That cleanup is a best-effort only ... if it fails, it just logs this information and not act on it. Cleanup can happen later too.
        • That means that if you have a really nasty crazy IO system (like MDW sometimes acts), the Replicator is not the one that's going to care about it. The app will hit those weird errors in other places too, e.g. when it tries to refresh SearcherManager or perform search.
          • These errors are not caused by the Replicator or bad handler operation. I.e. if the handler successfully called fsync(), yet the IO system decides to fail later ... that's really not the handler's problem.
        • Therefore the handlers are now simpler, don't use IW (and the crazy need to rollback()), and once files were successfully copied + sync'd, no more exceptions can occur by the handler (except callback may fail, but that's ok).
        • I also removed the timeout behavior the test employed – now that ReplicationClient has isUpdateThreadAlive(), assertHandlerRevision loops as long as the client is alive. If there's a serious bug, test-framework will terminate the test after 2 hours ...
        • ReplicationClient.startUpdateThread is nicer – allows starting the thread if updateThread != null, but !isAlive.

        Now beasting this patch.

        Show
        Shai Erera added a comment - New patch changes how handlers work: Beasting found a seed which uncovered a major problem with their current operation. They were trying to be "too honest" with the index and e.g. revert/delete upon any exception that occurred. Thanks to MDW, Mike and I decided to keep the handlers simple – if a handler successfully copies + syncs the revision files, then this is considered the "new revision". Kissing the index is now done not through IndexWriter, but rather deleting all files not referenced by last commit. That cleanup is a best-effort only ... if it fails, it just logs this information and not act on it. Cleanup can happen later too. That means that if you have a really nasty crazy IO system (like MDW sometimes acts), the Replicator is not the one that's going to care about it. The app will hit those weird errors in other places too, e.g. when it tries to refresh SearcherManager or perform search. These errors are not caused by the Replicator or bad handler operation. I.e. if the handler successfully called fsync(), yet the IO system decides to fail later ... that's really not the handler's problem. Therefore the handlers are now simpler, don't use IW (and the crazy need to rollback()), and once files were successfully copied + sync'd, no more exceptions can occur by the handler (except callback may fail, but that's ok). I also removed the timeout behavior the test employed – now that ReplicationClient has isUpdateThreadAlive(), assertHandlerRevision loops as long as the client is alive. If there's a serious bug, test-framework will terminate the test after 2 hours ... ReplicationClient.startUpdateThread is nicer – allows starting the thread if updateThread != null, but !isAlive. Now beasting this patch.
        Hide
        Shai Erera added a comment -

        Mike tripped a seed which was reproducible only on Linux which uncovered a bug in the handlers – writing the segments.gen file should be done only after updateHandlerState() is called, and from what the handler says is its last state. Otherwise, we could write segments.gen w/ gen=7, but "kissing" the index, or updating the state (reading commits) trips an error, the handler is reset back to the last revision it knows is good, and we end up w/ a segments.gen file from the future. This then causes DirReader.open to fail w/ FNFE looking for a futuristic segments_N file.

        MockDirWrapper is a nasty bitch. It should be called WhackoIOSubSystemDirectory!

        Patch improves the handlers + make the tests more resilient so they don't enter an infinite loop (that's part of what Mike hit).

        I'm beasting more. Please review the handlers' segments.gen writing logic.

        Show
        Shai Erera added a comment - Mike tripped a seed which was reproducible only on Linux which uncovered a bug in the handlers – writing the segments.gen file should be done only after updateHandlerState() is called, and from what the handler says is its last state. Otherwise, we could write segments.gen w/ gen=7, but "kissing" the index, or updating the state (reading commits) trips an error, the handler is reset back to the last revision it knows is good, and we end up w/ a segments.gen file from the future. This then causes DirReader.open to fail w/ FNFE looking for a futuristic segments_N file. MockDirWrapper is a nasty bitch. It should be called WhackoIOSubSystemDirectory! Patch improves the handlers + make the tests more resilient so they don't enter an infinite loop (that's part of what Mike hit). I'm beasting more. Please review the handlers' segments.gen writing logic.
        Hide
        Shai Erera added a comment -

        Good idea Mike. I factored out touchIndex, cleanupFilesOnFailure and copyFiles to static utilities on IndexReplHandler.

        Maybe instead of the static utilities, we can have an abstract IRH with two impls: SingleIndexReplicationHandler and IndexAndTaxoReplHandler. It can provide the utility methods as protected, plus implement the general framework for index replication, using abstract callbacks that are implemented by the concrete handlers. But perhaps we should do it later.

        Show
        Shai Erera added a comment - Good idea Mike. I factored out touchIndex, cleanupFilesOnFailure and copyFiles to static utilities on IndexReplHandler. Maybe instead of the static utilities, we can have an abstract IRH with two impls: SingleIndexReplicationHandler and IndexAndTaxoReplHandler. It can provide the utility methods as protected, plus implement the general framework for index replication, using abstract callbacks that are implemented by the concrete handlers. But perhaps we should do it later.
        Hide
        Michael McCandless added a comment -

        +1, I looked at the replication handlers and they look great!

        I wonder if we could factor out touchIndex to a static method and share from IndexReplicationHandler and IndexAndTaxoReplicationHandler?

        Show
        Michael McCandless added a comment - +1, I looked at the replication handlers and they look great! I wonder if we could factor out touchIndex to a static method and share from IndexReplicationHandler and IndexAndTaxoReplicationHandler?
        Hide
        Shai Erera added a comment -

        Forgot to include one change – handleUpdateEx should fail if the exception it hits is not IOE. Previous patch called super.handle(), which only logged it. But I think it's fair to say that the test shouldn't hit any exception, and terminate the client if it does.

        Show
        Shai Erera added a comment - Forgot to include one change – handleUpdateEx should fail if the exception it hits is not IOE. Previous patch called super.handle(), which only logged it. But I think it's fair to say that the test shouldn't hit any exception, and terminate the client if it does.
        Hide
        Shai Erera added a comment -

        Mike found a seed which tripped a test bug. Fixed it and on the way made the test less sensitive to time changes (i.e. it waited up to 20 seconds for the index to get updated, otherwise failed) and added some other improvements (to the test only).

        Let's search for more seeds .

        Show
        Shai Erera added a comment - Mike found a seed which tripped a test bug. Fixed it and on the way made the test less sensitive to time changes (i.e. it waited up to 20 seconds for the index to get updated, otherwise failed) and added some other improvements (to the test only). Let's search for more seeds .
        Hide
        Shai Erera added a comment -

        I ran both tests w/ tests.iters=1000 and they passed. This gives me more confidence about the robustness of these two handlers. Still, other machines can dig up "special" seeds .

        Show
        Shai Erera added a comment - I ran both tests w/ tests.iters=1000 and they passed. This gives me more confidence about the robustness of these two handlers. Still, other machines can dig up "special" seeds .
        Hide
        Shai Erera added a comment - - edited

        Patch resolves all remaining noocmmits. I made both IndexReplicationHandler and IndexAndTaxonomyReplicationHandler more resilient to errors but carefully reverting any changes made to the target indexes on errors. Also ensure that segments_N files are written last.

        I also added code to write segments.gen. To do that I factored out SegmentInfos.writeSegmentsGen which takes a Directory and generation. It is now used by SIS.finishCommit and the handlers.

        Would appreciate if someone can review the handlers, and whether they can be written in a more resilient ways.

        I want to beast IndexReplicationClientTest and IndexAndTaxonomyReplictionClientTest checkConsistencyOnExceptions before committing. If anyone's interested to help, just fire ant test -Dtestcase=IndexReplicationClientTest -Dtestsm.method=testConsistencyOnExceptions -Dtests.iters=1000 (and same for IndexAndTaxonomyReplicationClientTest) and let me know the seeds that failed.

        Show
        Shai Erera added a comment - - edited Patch resolves all remaining noocmmits. I made both IndexReplicationHandler and IndexAndTaxonomyReplicationHandler more resilient to errors but carefully reverting any changes made to the target indexes on errors. Also ensure that segments_N files are written last. I also added code to write segments.gen. To do that I factored out SegmentInfos.writeSegmentsGen which takes a Directory and generation. It is now used by SIS.finishCommit and the handlers. Would appreciate if someone can review the handlers, and whether they can be written in a more resilient ways. I want to beast IndexReplicationClientTest and IndexAndTaxonomyReplictionClientTest checkConsistencyOnExceptions before committing. If anyone's interested to help, just fire ant test -Dtestcase=IndexReplicationClientTest -Dtestsm.method=testConsistencyOnExceptions -Dtests.iters=1000 (and same for IndexAndTaxonomyReplicationClientTest) and let me know the seeds that failed.
        Hide
        Shai Erera added a comment - - edited

        The bug is that IndexWriter.close() waits for merges and commits. Lets quit kidding ourselves ok?

        Not sure I agree .. the bug is real, and if somebody did new IW().commit().close() without making any change, he might be surprised about that commit too, and also IW would overwrite an existing file. The only thing "close-not-committing" would solve in this case is that I won't need to call rollback(), but close(). The bug won't disappear.

        Show
        Shai Erera added a comment - - edited The bug is that IndexWriter.close() waits for merges and commits. Lets quit kidding ourselves ok? Not sure I agree .. the bug is real, and if somebody did new IW().commit().close() without making any change, he might be surprised about that commit too, and also IW would overwrite an existing file. The only thing "close-not-committing" would solve in this case is that I won't need to call rollback(), but close(). The bug won't disappear.
        Hide
        Robert Muir added a comment -

        I chatted about this with Mike and he confirmed my reasoning. This is very slim chance, and usually indicates a truly bad (or crazy) IO subsystem (i.e. not like MDW throwing random IOEs on opening the same file over and over again). I think perhaps this can be solved in IW by having it refer to the latest commit point read by IFD and not what it read. This seems safe to me, but perhaps an overkill. Anyway, it belongs in a different issue.

        What also happened here is that IW overwrote segments_a (violating write-once policy), which MDW didn't catch because for replicator tests I need to turn off preventDoubleWrite. Mike also says that IW doesn't guarantee write-once if it hits exceptions ...

        Sorry, i disagree and am against this.

        The bug is that IndexWriter.close() waits for merges and commits. Lets quit kidding ourselves ok?

        Show
        Robert Muir added a comment - I chatted about this with Mike and he confirmed my reasoning. This is very slim chance, and usually indicates a truly bad (or crazy) IO subsystem (i.e. not like MDW throwing random IOEs on opening the same file over and over again). I think perhaps this can be solved in IW by having it refer to the latest commit point read by IFD and not what it read. This seems safe to me, but perhaps an overkill. Anyway, it belongs in a different issue. What also happened here is that IW overwrote segments_a (violating write-once policy), which MDW didn't catch because for replicator tests I need to turn off preventDoubleWrite. Mike also says that IW doesn't guarantee write-once if it hits exceptions ... Sorry, i disagree and am against this. The bug is that IndexWriter.close() waits for merges and commits. Lets quit kidding ourselves ok?
        Hide
        Shai Erera added a comment -

        I'm trying really hard not to say anything sarcastic here

        Heh, I expected something from you .

        I chatted about this with Mike and he confirmed my reasoning. This is very slim chance, and usually indicates a truly bad (or crazy) IO subsystem (i.e. not like MDW throwing random IOEs on opening the same file over and over again). I think perhaps this can be solved in IW by having it refer to the latest commit point read by IFD and not what it read. This seems safe to me, but perhaps an overkill. Anyway, it belongs in a different issue.

        What also happened here is that IW overwrote segments_a (violating write-once policy), which MDW didn't catch because for replicator tests I need to turn off preventDoubleWrite. Mike also says that IW doesn't guarantee write-once if it hits exceptions ...

        So I think the safest solution is to deleteUnused() + rollback(), as anyway the handler must ensure that commits are not created by this "kiss".

        I will resolve the remaining nocommits and post a new patch.

        Show
        Shai Erera added a comment - I'm trying really hard not to say anything sarcastic here Heh, I expected something from you . I chatted about this with Mike and he confirmed my reasoning. This is very slim chance, and usually indicates a truly bad (or crazy) IO subsystem (i.e. not like MDW throwing random IOEs on opening the same file over and over again). I think perhaps this can be solved in IW by having it refer to the latest commit point read by IFD and not what it read. This seems safe to me, but perhaps an overkill. Anyway, it belongs in a different issue. What also happened here is that IW overwrote segments_a (violating write-once policy), which MDW didn't catch because for replicator tests I need to turn off preventDoubleWrite. Mike also says that IW doesn't guarantee write-once if it hits exceptions ... So I think the safest solution is to deleteUnused() + rollback(), as anyway the handler must ensure that commits are not created by this "kiss". I will resolve the remaining nocommits and post a new patch.
        Hide
        Robert Muir added a comment -

        I'm trying really hard not to say anything sarcastic here

        Show
        Robert Muir added a comment - I'm trying really hard not to say anything sarcastic here
        Hide
        Shai Erera added a comment -

        Ok some more insights ... I think this additional chain of events occurs:

        • IW's ctor reads gen 9, and SegmentInfos.getSegmentsFileName returns segments_9.
        • IFD then successfully reads both segments_9 and segments_a, ending up w/ two commit points.
        • IFD sorts them and passes to IndexDeletionPolicy (KeepLastCommit) which deletes segments_9 and keeps segments_a
        • IFD marks that startingCommitDeleted, as it is

        And here's what I still don't understand – for some reason, IW creates a new commit point, segments_a, with the commitData from segments_9. Still need to dig into that.

        In the meanwhile, I made the above change to the hanlder (to rollback(), not close()), and 430 iterations passed. Not sure if that's the right way to go ... if IW.close() didn't commit ...

        Show
        Shai Erera added a comment - Ok some more insights ... I think this additional chain of events occurs: IW's ctor reads gen 9, and SegmentInfos.getSegmentsFileName returns segments_9. IFD then successfully reads both segments_9 and segments_a, ending up w/ two commit points. IFD sorts them and passes to IndexDeletionPolicy (KeepLastCommit) which deletes segments_9 and keeps segments_a IFD marks that startingCommitDeleted, as it is And here's what I still don't understand – for some reason, IW creates a new commit point, segments_a, with the commitData from segments_9. Still need to dig into that. In the meanwhile, I made the above change to the hanlder (to rollback(), not close()), and 430 iterations passed. Not sure if that's the right way to go ... if IW.close() didn't commit ...
        Hide
        Shai Erera added a comment -

        Patch fixes a bug in IndexReplicationHandler (still need to fix in IndexAndTaxonomy) and adds some nocommits which I want to take care before I commit it.

        However, I hit a new test failure, which reproduces with the following command ant test -Dtestcase=IndexReplicationClientTest -Dtests.method=testConsistencyOnExceptions -Dtests.seed=EAF5294292642F1:6EE70BB59A9FC3CA.

        The error is weird. I ran the test w/ -Dtests.verbose=true and here's the troubling parts from the log:

        ReplicationThread-index: MockDirectoryWrapper: now throw random exception during open file=segments_a
        java.lang.Throwable
        	at org.apache.lucene.store.MockDirectoryWrapper.maybeThrowIOExceptionOnOpen(MockDirectoryWrapper.java:364)
        	at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:522)
        	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:281)
        	at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:340)
        	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:668)
        	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:515)
        	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
        	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:682)
        	at org.apache.lucene.replicator.IndexReplicationHandler.revisionReady(IndexReplicationHandler.java:208)
        	at org.apache.lucene.replicator.ReplicationClient.doUpdate(ReplicationClient.java:248)
        	at org.apache.lucene.replicator.ReplicationClient.access$1(ReplicationClient.java:188)
        	at org.apache.lucene.replicator.ReplicationClient$ReplicationThread.run(ReplicationClient.java:76)
        IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: current segments file is "segments_9"; deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@117da39a
        IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: load commit "segments_9"
        IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: load commit "segments_a"
        IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: now checkpoint "_0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 _7(5.0):c1 _8(5.0):c1" [9 segments ; isCommit = false]
        IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: 0 msec to checkpoint
        IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: deleteCommits: now decRef commit "segments_9"
        IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "segments_9"
        IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: create=false
        
        ....
        
        IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: startCommit(): start
        IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: startCommit index=_0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 _7(5.0):c1 _8(5.0):c1 changeCount=1
        IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: done all syncs: [_2.si, _7.si, _5.cfs, _1.fnm, _4.cfs, _8.si, _4.cfe, _5.cfe, _0.si, _0.fnm, _6.cfe, _8.cfs, _3.cfs, _4.si, _7.cfe, _2.cfs, _5.si, _6.cfs, _1.fdx, _8.cfe, _1.fdt, _1.si, _7.cfs, _0.fdx, _3.si, _6.si, _3.cfe, _2.cfe, _0.fdt]
        IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: commit: pendingCommit != null
        IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: commit: wrote segments file "segments_a"
        IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: now checkpoint "_0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 _7(5.0):c1 _8(5.0):c1" [9 segments ; isCommit = true]
        IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: deleteCommits: now decRef commit "segments_a"
        IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "_9.cfe"
        IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "_9.cfs"
        IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "_9.si"
        IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: 0 msec to checkpoint
        IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: commit: done
        IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: at close: _0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 _7(5.0):c1 _8(5.0):c1
        IndexReplicationHandler 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: updateHandlerState(): currentVersion=a currentRevisionFiles={index=[Lorg.apache.lucene.replicator.RevisionFile;@9bc2e26e}
        IndexReplicationHandler 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: {version=9}
        

        I debug traced it and here's what I think is happening:

        • MDW throws FNFE for segments_a on sis.read(dir), therefore the read SegmentInfos sees segments_9 as the current good commit. IW's segmentInfos.commitData stores version=9, which corresponds to segments_9.
        • IFD lists the files in the Directory, and finds both segments_a and segments_9 and through a series of calls, deletes segments_9 and keeps segments_a, since it is newer.
        • IW ctor, line 719, increments changeCount, since IFD.startingCommitDeleted is true – which happens b/c IFD is initialized with segments_9, but finds segments_a and therefore deletes it.
        • IW then makes a commit, with the commit data from segments_9 ("version=9"), to a new commit point generation 10 (a in hex).
        • The Replicator's latest version is gen=10, the handler reads gen=10 from the index, but with the wrong commitData, and therefore the test fails.

        I still want to review all this again, to double-check my understanding, but it looks like something bad happening between IW and IFD. At least from the perspective of the replicator, the index shouldn't "go forward" by new IW().close().

        If I modify the handler to do:

        IndexWriter writer = new IndexWriter();
        writer.deleteUnusedFiles();
        writer.rollback();
        

        The test passes. But is this the right solution – i.e. guarantee that IW never commits? Or is this a bug in IW?

        Show
        Shai Erera added a comment - Patch fixes a bug in IndexReplicationHandler (still need to fix in IndexAndTaxonomy) and adds some nocommits which I want to take care before I commit it. However, I hit a new test failure, which reproduces with the following command ant test -Dtestcase=IndexReplicationClientTest -Dtests.method=testConsistencyOnExceptions -Dtests.seed=EAF5294292642F1:6EE70BB59A9FC3CA . The error is weird. I ran the test w/ -Dtests.verbose=true and here's the troubling parts from the log: ReplicationThread-index: MockDirectoryWrapper: now throw random exception during open file=segments_a java.lang.Throwable at org.apache.lucene.store.MockDirectoryWrapper.maybeThrowIOExceptionOnOpen(MockDirectoryWrapper.java:364) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:522) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:281) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:340) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:668) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:515) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:682) at org.apache.lucene.replicator.IndexReplicationHandler.revisionReady(IndexReplicationHandler.java:208) at org.apache.lucene.replicator.ReplicationClient.doUpdate(ReplicationClient.java:248) at org.apache.lucene.replicator.ReplicationClient.access$1(ReplicationClient.java:188) at org.apache.lucene.replicator.ReplicationClient$ReplicationThread.run(ReplicationClient.java:76) IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: current segments file is "segments_9"; deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@117da39a IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: load commit "segments_9" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: load commit "segments_a" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: now checkpoint "_0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 _7(5.0):c1 _8(5.0):c1" [9 segments ; isCommit = false] IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: 0 msec to checkpoint IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: deleteCommits: now decRef commit "segments_9" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "segments_9" IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: init: create=false .... IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: startCommit(): start IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: startCommit index=_0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 _7(5.0):c1 _8(5.0):c1 changeCount=1 IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: done all syncs: [_2.si, _7.si, _5.cfs, _1.fnm, _4.cfs, _8.si, _4.cfe, _5.cfe, _0.si, _0.fnm, _6.cfe, _8.cfs, _3.cfs, _4.si, _7.cfe, _2.cfs, _5.si, _6.cfs, _1.fdx, _8.cfe, _1.fdt, _1.si, _7.cfs, _0.fdx, _3.si, _6.si, _3.cfe, _2.cfe, _0.fdt] IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: commit: pendingCommit != null IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: commit: wrote segments file "segments_a" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: now checkpoint "_0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 _7(5.0):c1 _8(5.0):c1" [9 segments ; isCommit = true] IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: deleteCommits: now decRef commit "segments_a" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "_9.cfe" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "_9.cfs" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: delete "_9.si" IFD 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: 0 msec to checkpoint IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: commit: done IW 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: at close: _0(5.0):C1 _1(5.0):C1 _2(5.0):c1 _3(5.0):c1 _4(5.0):c1 _5(5.0):c1 _6(5.0):c1 _7(5.0):c1 _8(5.0):c1 IndexReplicationHandler 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: updateHandlerState(): currentVersion=a currentRevisionFiles={index=[Lorg.apache.lucene.replicator.RevisionFile;@9bc2e26e} IndexReplicationHandler 0 [Wed May 08 22:47:46 WST 2013; ReplicationThread-index]: {version=9} I debug traced it and here's what I think is happening: MDW throws FNFE for segments_a on sis.read(dir), therefore the read SegmentInfos sees segments_9 as the current good commit. IW's segmentInfos.commitData stores version=9, which corresponds to segments_9. IFD lists the files in the Directory, and finds both segments_a and segments_9 and through a series of calls, deletes segments_9 and keeps segments_a, since it is newer. IW ctor, line 719, increments changeCount, since IFD.startingCommitDeleted is true – which happens b/c IFD is initialized with segments_9, but finds segments_a and therefore deletes it. IW then makes a commit, with the commit data from segments_9 ("version=9"), to a new commit point generation 10 (a in hex). The Replicator's latest version is gen=10, the handler reads gen=10 from the index, but with the wrong commitData, and therefore the test fails. I still want to review all this again, to double-check my understanding, but it looks like something bad happening between IW and IFD. At least from the perspective of the replicator, the index shouldn't "go forward" by new IW().close(). If I modify the handler to do: IndexWriter writer = new IndexWriter(); writer.deleteUnusedFiles(); writer.rollback(); The test passes. But is this the right solution – i.e. guarantee that IW never commits? Or is this a bug in IW?
        Hide
        Shai Erera added a comment -

        I'm having reservations about creating a replicator/facet module which contains 2 classes ... Maybe we should proceed with the code as-is, and then refactor if it creates a problem, or the module grows? Perhaps the breakout won't be to replicator/common and replicator/facet but to replicator/infra (or common) and replicator/extras which will serve like a catchall for other modules too (e.g. facet, suggest).

        Another way is to break out replicator to common and framework/infra/impl such that common contains only whatever other modules require to compile against (i.e Revision, ReplicationHandler, maybe Replicator). Then we can add the facet replication code to facet/ with a dependency on replicator/common.

        But really, I think we should just get it in and start to work with it, have deeper reviews and refactor as we go.

        Show
        Shai Erera added a comment - I'm having reservations about creating a replicator/facet module which contains 2 classes ... Maybe we should proceed with the code as-is, and then refactor if it creates a problem, or the module grows? Perhaps the breakout won't be to replicator/common and replicator/facet but to replicator/infra (or common) and replicator/extras which will serve like a catchall for other modules too (e.g. facet, suggest). Another way is to break out replicator to common and framework/infra/impl such that common contains only whatever other modules require to compile against (i.e Revision, ReplicationHandler, maybe Replicator). Then we can add the facet replication code to facet/ with a dependency on replicator/common. But really, I think we should just get it in and start to work with it, have deeper reviews and refactor as we go.
        Hide
        Shai Erera added a comment -

        I think that's not a bad idea! replicator/common will include the interfaces (Revision and ReplicationHandler) + the framework impl and also IndexRevision/Handler. replicator/facet will include the taxonomy parts and depend on replicator/common and facet.

        I can also move the facet related code under oal.replicator.facet and then suppress the Lucene3x codec for just these tests.

        If others agree, I'll make the changes (mostly build.xml changes).

        Show
        Shai Erera added a comment - I think that's not a bad idea! replicator/common will include the interfaces (Revision and ReplicationHandler) + the framework impl and also IndexRevision/Handler. replicator/facet will include the taxonomy parts and depend on replicator/common and facet. I can also move the facet related code under oal.replicator.facet and then suppress the Lucene3x codec for just these tests. If others agree, I'll make the changes (mostly build.xml changes).
        Hide
        Adrien Grand added a comment -

        Then maybe we could have sub-modules for specific replication strategies?

        To make my point a little clearer, I was suggesting something pretty much like the analysis module: analyzers that require additional dependencies (such as icu or morfologik) are in their own sub-module so that you don't need to pull the ICU or Morfologik JARs if you just want to use LetterTokenizer (which is in lucene/analysis/common).

        Likewise, we could have the interface and the logic to replicate simple (no sidecar data) indexes in lucene/replicator/common and have sub-modules for facet (lucene/replicator/facet) or suggesters (lucene/replicator/suggesters).

        This may look overkill but at least this would help us keep dependencies clean between modules.

        Show
        Adrien Grand added a comment - Then maybe we could have sub-modules for specific replication strategies? To make my point a little clearer, I was suggesting something pretty much like the analysis module: analyzers that require additional dependencies (such as icu or morfologik) are in their own sub-module so that you don't need to pull the ICU or Morfologik JARs if you just want to use LetterTokenizer (which is in lucene/analysis/common). Likewise, we could have the interface and the logic to replicate simple (no sidecar data) indexes in lucene/replicator/common and have sub-modules for facet (lucene/replicator/facet) or suggesters (lucene/replicator/suggesters). This may look overkill but at least this would help us keep dependencies clean between modules.
        Hide
        Shai Erera added a comment -

        Ok, so there are 3 options I see: (1) have Replicator depend on Facet (and in the future on other modules), (2) have Facet depend on Replicator and (3) move Revision and ReplicationHandler (interfaces) someplace else, core or a new module we call 'commons' and Replicator and Facet depend on it. Tests though will need to depend on replicator though, since they need ReplicationClient.

        BTW, the jetty dependencies are tests only, but I don't know how to make ivy resolve the dependencies just for tests. The only thing replicator depends on is servlet-api, for ReplicationService and httpclient for ReplicationClient. I think these need to remain in the module ...

        If we made Facet depend on Replicator (I'm not totally against it), would that require you to have lucene-replicator.jar on the classpath, even if you don't use replication? If not, then perhaps this dependency isn't so bad ... it's just a compile-time dependency. Tests will still need to depend on replicator for runtime, but that's ok I think.

        Show
        Shai Erera added a comment - Ok, so there are 3 options I see: (1) have Replicator depend on Facet (and in the future on other modules), (2) have Facet depend on Replicator and (3) move Revision and ReplicationHandler (interfaces) someplace else, core or a new module we call 'commons' and Replicator and Facet depend on it. Tests though will need to depend on replicator though, since they need ReplicationClient. BTW, the jetty dependencies are tests only, but I don't know how to make ivy resolve the dependencies just for tests. The only thing replicator depends on is servlet-api, for ReplicationService and httpclient for ReplicationClient. I think these need to remain in the module ... If we made Facet depend on Replicator (I'm not totally against it), would that require you to have lucene-replicator.jar on the classpath, even if you don't use replication? If not, then perhaps this dependency isn't so bad ... it's just a compile-time dependency. Tests will still need to depend on replicator for runtime, but that's ok I think.
        Hide
        Robert Muir added a comment -

        I still haven't had a change to look at the patch: but it sounds like some work needs to be done here to prevent dll hell.

        having replicator depend upon all sidecar modules is a no-go.

        it sounds like an interface is missing.

        Show
        Robert Muir added a comment - I still haven't had a change to look at the patch: but it sounds like some work needs to be done here to prevent dll hell. having replicator depend upon all sidecar modules is a no-go. it sounds like an interface is missing.
        Hide
        Adrien Grand added a comment -

        Then maybe we could have sub-modules for specific replication strategies? lucene/replicator would only know how to handle raw indexes, while lucene/replicator/facets or lucene/replicator/suggest would implement custom logic?

        This way lucene/facet wouldn't need to pull all lucene/replicator transitive dependencies, and lucene/replicator wouldn't depend on any lucene module but lucene/core.

        Show
        Adrien Grand added a comment - Then maybe we could have sub-modules for specific replication strategies? lucene/replicator would only know how to handle raw indexes, while lucene/replicator/facets or lucene/replicator/suggest would implement custom logic? This way lucene/facet wouldn't need to pull all lucene/replicator transitive dependencies, and lucene/replicator wouldn't depend on any lucene module but lucene/core.
        Hide
        Shai Erera added a comment -

        As I said, arguments can be made both ways ... I don't know what's the best way here. I can see your point, but I don't feel good about having facet depend on replicator. I see Replicator as a higher-level service that besides providing the replication framework, also comes pre-built for replicating Lucene stuff. I don't mind seeing it grow to accommodate other Revision types in the future. For example, IndexAndTaxonomyRevision is just an example for replicating multiple indexes together. It can easily be duplicated to replicate few indexes at once, e.g. a MultiIndexRevision. Where would that object be? Cannot be in core, so why should IndexAndTaxo be in facet?

        Show
        Shai Erera added a comment - As I said, arguments can be made both ways ... I don't know what's the best way here. I can see your point, but I don't feel good about having facet depend on replicator. I see Replicator as a higher-level service that besides providing the replication framework, also comes pre-built for replicating Lucene stuff. I don't mind seeing it grow to accommodate other Revision types in the future. For example, IndexAndTaxonomyRevision is just an example for replicating multiple indexes together. It can easily be duplicated to replicate few indexes at once, e.g. a MultiIndexRevision. Where would that object be? Cannot be in core, so why should IndexAndTaxo be in facet?
        Hide
        Robert Muir added a comment -

        Even in the future, I would imagine that if we added support for replicating a suggester files, then it would make sense to put a dependency between replicator and suggester, rather than the other way around.

        Wait: how does this make sense?!

        It should be the other way around: if suggester has a sidecar it needs special logic for replication.

        It does not need faceting.

        Show
        Robert Muir added a comment - Even in the future, I would imagine that if we added support for replicating a suggester files, then it would make sense to put a dependency between replicator and suggester, rather than the other way around. Wait: how does this make sense?! It should be the other way around: if suggester has a sidecar it needs special logic for replication. It does not need faceting.
        Hide
        Adrien Grand added a comment -

        Good points, you convinced me.

        Show
        Adrien Grand added a comment - Good points, you convinced me.
        Hide
        Shai Erera added a comment -

        maybe also call MDW.setRandomIOExceptionRateOnOpen

        Thanks Mike! I added that and a slew of problems surfaced, most of them in the test, but I improved the handlers' implementation to cleanup after themselves if e.g. a copy or sync to the handlerDir failed. While this wasn't a bug, it leaves the target index directory clean.

        There's one nocommit which bugs me though – I had to add dir.setPreventDoubleWrite(false) because when the handler fails during copying of say _2.fdt to the index dir, the file is deleted from the indexDir, and the client re-attemts to upgrade. At this point, MDW complains that _2.fdt was already written to, even though I deleted it.

        Adding this setPrevent was the only way I could make MDW happy, but I don't like it since I do want to catch errors in the handler/client if they e.g. attempt to copy over an existing file.

        Maybe we can make MDW respond somehow to delete()? I know that has bad implications on its own, e.g. code which deletes and then accidentally recreates files with older names ... any ideas?

        Show
        Shai Erera added a comment - maybe also call MDW.setRandomIOExceptionRateOnOpen Thanks Mike! I added that and a slew of problems surfaced, most of them in the test, but I improved the handlers' implementation to cleanup after themselves if e.g. a copy or sync to the handlerDir failed. While this wasn't a bug, it leaves the target index directory clean. There's one nocommit which bugs me though – I had to add dir.setPreventDoubleWrite(false) because when the handler fails during copying of say _2.fdt to the index dir, the file is deleted from the indexDir, and the client re-attemts to upgrade. At this point, MDW complains that _2.fdt was already written to, even though I deleted it. Adding this setPrevent was the only way I could make MDW happy, but I don't like it since I do want to catch errors in the handler/client if they e.g. attempt to copy over an existing file. Maybe we can make MDW respond somehow to delete()? I know that has bad implications on its own, e.g. code which deletes and then accidentally recreates files with older names ... any ideas?
        Hide
        Shai Erera added a comment -

        I've been wondering about that too, but chose to keep the facet replication code under replicator for few reasons:

        • A Revision contains files from multiple sources, and the taxonomy index is partly responsible for that. And ReplicationClient respects that – so I guess it's not entirely true that the Replicator is unaware of taxonomy (even though it would still work if I pulled the taxonomy stuff out of it).
        • I think it makes less sense to require lucene-replicator.jar for every faceted search app which makes use of lucene-facet.jar. The key reason is that replicator requires few additional jars such as httpclient, httpcore, jetty, servlet-api. Requiring lucene-facet.jar seems less painful to me, than requiring every faceted search app out there to include all these jars even if it doesn't want to do replication.
        • I like to keep things local to the module. There are many similarities between IndexAndTaxoRevision to IndexRevision (likewise for their handlers and tests). Therefore whenever I made change to one, I knew I should go make a similar change to the other.

        All in all, I guess arguments can be made both ways, but I prefer for the now to keep things local to the replicator module. Even in the future, I would imagine that if we added support for replicating a suggester files, then it would make sense to put a dependency between replicator and suggester, rather than the other way around.

        Show
        Shai Erera added a comment - I've been wondering about that too, but chose to keep the facet replication code under replicator for few reasons: A Revision contains files from multiple sources, and the taxonomy index is partly responsible for that. And ReplicationClient respects that – so I guess it's not entirely true that the Replicator is unaware of taxonomy (even though it would still work if I pulled the taxonomy stuff out of it). I think it makes less sense to require lucene-replicator.jar for every faceted search app which makes use of lucene-facet.jar. The key reason is that replicator requires few additional jars such as httpclient, httpcore, jetty, servlet-api. Requiring lucene-facet.jar seems less painful to me, than requiring every faceted search app out there to include all these jars even if it doesn't want to do replication. I like to keep things local to the module. There are many similarities between IndexAndTaxoRevision to IndexRevision (likewise for their handlers and tests). Therefore whenever I made change to one, I knew I should go make a similar change to the other. All in all, I guess arguments can be made both ways, but I prefer for the now to keep things local to the replicator module. Even in the future, I would imagine that if we added support for replicating a suggester files, then it would make sense to put a dependency between replicator and suggester, rather than the other way around.
        Hide
        Adrien Grand added a comment -

        +1 to commit too.

        Looking at the code, there seems to be specialized implementations for faceting because of the need to replicate the taxonomy indexes too, so I was wondering that maybe this facet-specific code should be under lucene/facets rather than lucene/replicator so that lucene/replicator doesn't need to depend on all modules that have specific replication needs. (I'm not sure what the best option is yet, this can be addressed afterwards.)

        Show
        Adrien Grand added a comment - +1 to commit too. Looking at the code, there seems to be specialized implementations for faceting because of the need to replicate the taxonomy indexes too, so I was wondering that maybe this facet-specific code should be under lucene/facets rather than lucene/replicator so that lucene/replicator doesn't need to depend on all modules that have specific replication needs. (I'm not sure what the best option is yet, this can be addressed afterwards.)
        Hide
        Tommaso Teofili added a comment -

        +1 for committing it

        Show
        Tommaso Teofili added a comment - +1 for committing it
        Hide
        Michael McCandless added a comment -

        +1 to commit and iterate from here on... this new module looks very nice!

        I like the new testConsistencyOnException ... maybe also call MDW.setRandomIOExceptionRateOnOpen? This will additionally randomly throw exceptions from openInput/createOutput.

        Show
        Michael McCandless added a comment - +1 to commit and iterate from here on... this new module looks very nice! I like the new testConsistencyOnException ... maybe also call MDW.setRandomIOExceptionRateOnOpen? This will additionally randomly throw exceptions from openInput/createOutput.
        Hide
        Shai Erera added a comment -

        Added testConsistencyOnException to test the client and handlers' behavior when they encounter exceptions (I use MockDirWrapp diskFull and randomIOE to simulate that).

        I think this module is basically ready. I.e. it comes with tests, javadocs and pretty much does what it was written to do. I'm sure there's room for improvement, but I don't think this should hold off the commit. So unless there are any objections, I intend to commit in by Tuesday. If people want to do a thorough review, I don't mind waiting with the commit, but just drop a comment on the issue to let me know.

        Show
        Shai Erera added a comment - Added testConsistencyOnException to test the client and handlers' behavior when they encounter exceptions (I use MockDirWrapp diskFull and randomIOE to simulate that). I think this module is basically ready. I.e. it comes with tests, javadocs and pretty much does what it was written to do. I'm sure there's room for improvement, but I don't think this should hold off the commit. So unless there are any objections, I intend to commit in by Tuesday. If people want to do a thorough review, I don't mind waiting with the commit, but just drop a comment on the issue to let me know.
        Hide
        Shai Erera added a comment -

        Patch adds @lucene.experimental to all classes. I also excluded all the licesne files from the patch, so that it's smaller and easier to review.

        Show
        Shai Erera added a comment - Patch adds @lucene.experimental to all classes. I also excluded all the licesne files from the patch, so that it's smaller and easier to review.
        Hide
        Shai Erera added a comment -

        Patch updates IndexRevision and IndexAndTaxonomyRevision following the recent changes to SDP (not requiring snapshot ID). I also noticed I put grouping in pom.template rather than facet - fixed.

        As for this weird error I got, I noticed that the offending files under test-framework had an extra line none of the other modules seemed to have: <div role="navigation" title="All Classes">. I have no idea how it got there. So I ran 'ant clean documentation' and it passed! I will run precommit tomorrow.

        Show
        Shai Erera added a comment - Patch updates IndexRevision and IndexAndTaxonomyRevision following the recent changes to SDP (not requiring snapshot ID). I also noticed I put grouping in pom.template rather than facet - fixed. As for this weird error I got, I noticed that the offending files under test-framework had an extra line none of the other modules seemed to have: <div role="navigation" title="All Classes">. I have no idea how it got there. So I ran 'ant clean documentation' and it passed! I will run precommit tomorrow.
        Hide
        Shai Erera added a comment -

        Patch contains the Replicator code + all needed stuff to make it a valid module (ivy, maven, licenses etc.). A big portion of the patch is due to the ivy/maven/licenses things, so if you focus on the code, it's not that big.

        I ran precommit and it fails with this cryptic error:

        -documentation-lint:
             [echo] checking for broken html...
            [jtidy] Checking for broken html (such as invalid tags)...
           [delete] Deleting directory D:\dev\lucene\lucene-replicator\lucene\build\jtidy_tmp
             [echo] Checking for broken links...
             [exec] Traceback (most recent call last):
             [exec]   File "D:\dev\lucene\lucene-replicator\dev-tools/scripts/checkJavadocLinks.py", line 260, in
             [exec] Crawl/parse...
             [exec]  <module>
             [exec]     if checkAll(sys.argv[1]):
             [exec]   File "D:\dev\lucene\lucene-replicator\dev-tools/scripts/checkJavadocLinks.py", line 160, in checkAll
             [exec]     allFiles[fullPath] = parse(fullPath, open('%s/%s' % (root, f), encoding='UTF-8').read())
             [exec]   File "D:\dev\lucene\lucene-replicator\dev-tools/scripts/checkJavadocLinks.py", line 110, in parse
             [exec]     parser.feed(html)
             [exec]   File "/usr/lib/python3.2/html/parser.py", line 142, in feed
             [exec]     self.goahead(0)
             [exec]   File "/usr/lib/python3.2/html/parser.py", line 188, in goahead
             [exec]     k = self.parse_endtag(i)
             [exec]   File "/usr/lib/python3.2/html/parser.py", line 454, in parse_endtag
             [exec]     self.handle_endtag(elem.lower())
             [exec]   File "D:\dev\lucene\lucene-replicator\dev-tools/scripts/checkJavadocLinks.py", line 92, in handle_endtag
             [exec]     raise RuntimeError('%s %s:%s: saw </%s> but expected </%s>' % (self.baseURL, self.getpos()[0], self.getpos()[1], tag, self.stack[-1]))
             [exec] RuntimeError: file:///build/docs/core/allclasses-frame.html 680:0: saw </body> but expected </div>
        

        Does anyone know what this is? I searched for <div> and I don't have any. Also, if I run 'ant documentation' from top-level, it passes. I ran this w/ Oracle 1.7. I did find a closing </body> tag with no opening tag under the jetty license, but I don't think it's related?

        Otherwise, maven check passes, and tests compile and run successfully.

        Show
        Shai Erera added a comment - Patch contains the Replicator code + all needed stuff to make it a valid module (ivy, maven, licenses etc.). A big portion of the patch is due to the ivy/maven/licenses things, so if you focus on the code, it's not that big. I ran precommit and it fails with this cryptic error: -documentation-lint: [echo] checking for broken html... [jtidy] Checking for broken html (such as invalid tags)... [delete] Deleting directory D:\dev\lucene\lucene-replicator\lucene\build\jtidy_tmp [echo] Checking for broken links... [exec] Traceback (most recent call last): [exec] File "D:\dev\lucene\lucene-replicator\dev-tools/scripts/checkJavadocLinks.py", line 260, in [exec] Crawl/parse... [exec] <module> [exec] if checkAll(sys.argv[1]): [exec] File "D:\dev\lucene\lucene-replicator\dev-tools/scripts/checkJavadocLinks.py", line 160, in checkAll [exec] allFiles[fullPath] = parse(fullPath, open('%s/%s' % (root, f), encoding='UTF-8').read()) [exec] File "D:\dev\lucene\lucene-replicator\dev-tools/scripts/checkJavadocLinks.py", line 110, in parse [exec] parser.feed(html) [exec] File "/usr/lib/python3.2/html/parser.py", line 142, in feed [exec] self.goahead(0) [exec] File "/usr/lib/python3.2/html/parser.py", line 188, in goahead [exec] k = self.parse_endtag(i) [exec] File "/usr/lib/python3.2/html/parser.py", line 454, in parse_endtag [exec] self.handle_endtag(elem.lower()) [exec] File "D:\dev\lucene\lucene-replicator\dev-tools/scripts/checkJavadocLinks.py", line 92, in handle_endtag [exec] raise RuntimeError('%s %s:%s: saw </%s> but expected </%s>' % (self.baseURL, self.getpos()[0], self.getpos()[1], tag, self.stack[-1])) [exec] RuntimeError: file:///build/docs/core/allclasses-frame.html 680:0: saw </body> but expected </div> Does anyone know what this is? I searched for <div> and I don't have any. Also, if I run 'ant documentation' from top-level, it passes. I ran this w/ Oracle 1.7. I did find a closing </body> tag with no opening tag under the jetty license, but I don't think it's related? Otherwise, maven check passes, and tests compile and run successfully.
        Hide
        Shai Erera added a comment -

        So here's an overview how the Replicator works (it's also document under oal.replicator.package.html):

        At a high-level, producers (e.g. indexer) publish Revisions, and consumers update to the latest Revision available. Like SVN, if a client is on rev1 and the server has rev4, the next update request will upgrade the client to rev4, skipping all intermediate revisions.

        The Replicator offers two implementations at the moment: LocalReplicator to be used by at the server side and HttpReplicator to be used by clients to e.g. update over HTTP. In the future, we may want to add other Replicator implementations, e.g. rsync, torrent... for HTTP, the package also provides a ReplicationService which acts on the Http servlet request/response following some API specification. In that sense, the HttpReplicator expects a certain HTTP impl on the server side, so ReplicationService helps you by implementation that API. The reason it's not a servlet is so that you can plug it into your application servlet freely.

        A Revision is basically a list of files and sources. For example, IndexRevision contains the list of files in an IndexCommit (and only one source), while IndexAndTaxonomyRevision contains the list of files from both IndexCommits with corresponding sources (index/taxonomy). When the server publishes either of these two revision, the IndexCommits are snapshotted so that files aren't deleted, and the Replicator serves file requests (by clients) from the Revision. The Revision is also responsible for releasing itself – this is done automatically by the Replicator which releases a revision when it's no longer needed (i.e. there's a new one already) and there are no clients that currently replicate its files.

        On the client side, the package offers a ReplicationClient class which can be invoked either manually, or start its update-thread to periodically check for updates. The client is given a ReplicationHandler (two matching implementations: IndexReplicationHandler and IndexAndTaxonomyReplicationHandler) which is responsible to act on the replicated files. The client first obtains all needed files (i.e. those that the new Revision offers, and the client is still missing), and after they were all successfully copied over, the handler is invoked. Both handlers copy the files from their temporary location to the index directories, fsync them and kiss the index such that unused files are deleted. You can provide each handler a Callable which is invoked after the index has been safely and successfully updated, so you can e.g. searcherManager.maybeReopen().

        Here's a general code example that explains how to work with the Replicator:

        // ++++++++++++++ SERVER SIDE ++++++++++++++ // 
        IndexWriter publishWriter; // the writer used for indexing
        Replicator replicator = new LocalReplicator();
        replicator.publish(new IndexRevision(publishWriter));
        
        // ++++++++++++++ CLIENT SIDE ++++++++++++++ // 
        // either LocalReplictor, or HttpReplicator if client and server are on different nodes
        Replicator replicator;
        
        // callback invoked after handler finished handling the revision and e.g. can reopen the reader.
        Callable&lt;Boolean&gt; callback = null; // can also be null if no callback is needed
        ReplicationHandler handler = new IndexReplicationHandler(indexDir, callback);
        SourceDirectoryFactory factory = new PerSessionDirectoryFactory(workDir);
        ReplicationClient client = new ReplicationClient(replicator, handler, factory);
        
        // invoke client manually
        client.updateNow();
        	
        // or, periodically
        client.startUpdateThread(100); // check for update every 100 milliseconds
        

        The package of course comes with unit tests, though I'm sure there's room for improvement (there always is!).

        Show
        Shai Erera added a comment - So here's an overview how the Replicator works (it's also document under oal.replicator.package.html): At a high-level, producers (e.g. indexer) publish Revisions, and consumers update to the latest Revision available. Like SVN, if a client is on rev1 and the server has rev4, the next update request will upgrade the client to rev4, skipping all intermediate revisions. The Replicator offers two implementations at the moment: LocalReplicator to be used by at the server side and HttpReplicator to be used by clients to e.g. update over HTTP. In the future, we may want to add other Replicator implementations, e.g. rsync, torrent... for HTTP, the package also provides a ReplicationService which acts on the Http servlet request/response following some API specification. In that sense, the HttpReplicator expects a certain HTTP impl on the server side, so ReplicationService helps you by implementation that API. The reason it's not a servlet is so that you can plug it into your application servlet freely. A Revision is basically a list of files and sources. For example, IndexRevision contains the list of files in an IndexCommit (and only one source), while IndexAndTaxonomyRevision contains the list of files from both IndexCommits with corresponding sources (index/taxonomy). When the server publishes either of these two revision, the IndexCommits are snapshotted so that files aren't deleted, and the Replicator serves file requests (by clients) from the Revision. The Revision is also responsible for releasing itself – this is done automatically by the Replicator which releases a revision when it's no longer needed (i.e. there's a new one already) and there are no clients that currently replicate its files. On the client side, the package offers a ReplicationClient class which can be invoked either manually, or start its update-thread to periodically check for updates. The client is given a ReplicationHandler (two matching implementations: IndexReplicationHandler and IndexAndTaxonomyReplicationHandler) which is responsible to act on the replicated files. The client first obtains all needed files (i.e. those that the new Revision offers, and the client is still missing), and after they were all successfully copied over, the handler is invoked. Both handlers copy the files from their temporary location to the index directories, fsync them and kiss the index such that unused files are deleted. You can provide each handler a Callable which is invoked after the index has been safely and successfully updated, so you can e.g. searcherManager.maybeReopen(). Here's a general code example that explains how to work with the Replicator: // ++++++++++++++ SERVER SIDE ++++++++++++++ // IndexWriter publishWriter; // the writer used for indexing Replicator replicator = new LocalReplicator(); replicator.publish( new IndexRevision(publishWriter)); // ++++++++++++++ CLIENT SIDE ++++++++++++++ // // either LocalReplictor, or HttpReplicator if client and server are on different nodes Replicator replicator; // callback invoked after handler finished handling the revision and e.g. can reopen the reader. Callable&lt; Boolean &gt; callback = null ; // can also be null if no callback is needed ReplicationHandler handler = new IndexReplicationHandler(indexDir, callback); SourceDirectoryFactory factory = new PerSessionDirectoryFactory(workDir); ReplicationClient client = new ReplicationClient(replicator, handler, factory); // invoke client manually client.updateNow(); // or, periodically client.startUpdateThread(100); // check for update every 100 milliseconds The package of course comes with unit tests, though I'm sure there's room for improvement (there always is!).

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development