Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Today, the use of addIndexes and addIndexesNoOptimize is confusing -
      especially on when to invoke each. Also, addIndexes calls optimize() in
      the beginning, but only on the target index. It also includes the
      following jdoc statement, which from how I understand the code, is
      wrong: After this completes, the index is optimized. – optimize() is
      called in the beginning and not in the end.

      On the other hand, addIndexesNoOptimize does not call optimize(), and
      relies on the MergeScheduler and MergePolicy to handle the merges.

      After a short discussion about that on the list (Thanks Mike for the
      clarifications!) I understand that there are really two core differences
      between the two:

      • addIndexes supports IndexReader extensions
      • addIndexesNoOptimize performs better

      This issue proposes the following:

      1. Clear up the documentation of each, spelling out the pros/cons of
        calling them clearly in the javadocs.
      2. Rename addIndexesNoOptimize to addIndexes
      3. Remove optimize() call from addIndexes(IndexReader...)
      4. Document that clearly in both, w/ a recommendation to call optimize()
        before on any of the Directories/Indexes if it's a concern.

      That way, we maintain all the flexibility in the API -
      addIndexes(IndexReader...) allows for using IR extensions,
      addIndexes(Directory...) is considered more efficient, by allowing the
      merges to happen concurrently (depending on MS) and also factors in the
      MP. So unless you have an IR extension, addDirectories is really the one
      you should be using. And you have the freedom to call optimize() before
      each if you care about it, or don't if you don't care. Either way,
      incurring the cost of optimize() is entirely in the user's hands.

      BTW, addIndexes(IndexReader...) does not use neither the MergeScheduler
      nor MergePolicy, but rather call SegmentMerger directly. This might be
      another place for improvement. I'll look into it, and if it's not too
      complicated, I may cover it by this issue as well. If you have any hints
      that can give me a good head start on that, please don't be shy .

      1. LUCENE-2455_3x.patch
        56 kB
        Shai Erera
      2. LUCENE-2455_3x.patch
        77 kB
        Shai Erera
      3. LUCENE-2455_3x.patch
        105 kB
        Shai Erera
      4. LUCENE-2455_3x.patch
        105 kB
        Shai Erera
      5. LUCENE-2455_3x.patch
        105 kB
        Shai Erera
      6. LUCENE-2455_trunk.patch
        141 kB
        Shai Erera
      7. index.31.cfs.zip
        5 kB
        Uwe Schindler
      8. index.31.nocfs.zip
        9 kB
        Uwe Schindler

        Activity

        Hide
        Michael McCandless added a comment -

        Remove optimize() call from addIndexes(IndexReader...)

        This still makes me nervous. Yeah it's bad that this method does optimize() now. But if we remove it, it's bad that this method can attempt to do a ridiculously immense merge, since it [naively] just stuffs everything and and does one merge. Ie, both at are bad.

        Maybe... we could do this: only merge the the incoming IndexReaders, appending a new segment to the end of the index? Ie do no merging whatsoever of the current segments in the index.

        Yes, this can result in "unbalanced" segments (ie, a huge segment appears after the long tail of level 0 segments), but, the merge policy can handle this – it'll work out whatever merges are then necessary to get this segment onto the level that roughly matches its size.

        So unless you have an IR extension, addDirectories is really the one you should be using.

        You mean addIndexes(Directory..)?

        BTW, addIndexes(IndexReader...) does not use neither the MergeScheduler
        nor MergePolicy, but rather call SegmentMerger directly. This might be
        another place for improvement. I'll look into it, and if it's not too
        complicated, I may cover it by this issue as well. If you have any hints
        that can give me a good head start on that, please don't be shy .

        This would be best of all But it's tricky, because our MP/MS assume they are working w/ a SegmentInfo. But, maybe it could somehow be made to work – eg IR does give us maxDoc, numDocs (so we can know del doc count). But eg LogByteSizeMergePolicy goes and computes total byte size of the segment (via SegmentInfo) which we cannot do from an IR.

        Show
        Michael McCandless added a comment - Remove optimize() call from addIndexes(IndexReader...) This still makes me nervous. Yeah it's bad that this method does optimize() now. But if we remove it, it's bad that this method can attempt to do a ridiculously immense merge, since it [naively] just stuffs everything and and does one merge. Ie, both at are bad. Maybe... we could do this: only merge the the incoming IndexReaders, appending a new segment to the end of the index? Ie do no merging whatsoever of the current segments in the index. Yes, this can result in "unbalanced" segments (ie, a huge segment appears after the long tail of level 0 segments), but, the merge policy can handle this – it'll work out whatever merges are then necessary to get this segment onto the level that roughly matches its size. So unless you have an IR extension, addDirectories is really the one you should be using. You mean addIndexes(Directory..)? BTW, addIndexes(IndexReader...) does not use neither the MergeScheduler nor MergePolicy, but rather call SegmentMerger directly. This might be another place for improvement. I'll look into it, and if it's not too complicated, I may cover it by this issue as well. If you have any hints that can give me a good head start on that, please don't be shy . This would be best of all But it's tricky, because our MP/MS assume they are working w/ a SegmentInfo. But, maybe it could somehow be made to work – eg IR does give us maxDoc, numDocs (so we can know del doc count). But eg LogByteSizeMergePolicy goes and computes total byte size of the segment (via SegmentInfo) which we cannot do from an IR.
        Hide
        Shai Erera added a comment -

        You mean addIndexes(Directory..)?

        Yes, copy-paste error.

        Maybe... we could do this: only merge the the incoming IndexReaders, appending a new segment to the end of the index?

        I like it. IMO, that's what the method should do anyway, for better performance and service to the users. If I'm adding indexes, that doesn't mean I want a whole merge process to kick off. If I want that, I can call maybeMerge or optimize afterwards.

        Basically, what I would like to add (and I'm not sure it belongs to this issue) is a "super fast" addIndexes method, something like registerIndexes, which doesn't even traverses the posting lists, removes deleted docs etc. - simply registering the new segments in the Directory. If needed - do a bulk copy of the files and update segments*. Simple as that. Maybe it does fit in that issue, as part of the general "house cleaning"?

        I will look more closely into supporting MP + MS w/ addIndexes(readers). Can't promise anything as I learn the code as I go .

        Show
        Shai Erera added a comment - You mean addIndexes(Directory..)? Yes, copy-paste error. Maybe... we could do this: only merge the the incoming IndexReaders, appending a new segment to the end of the index? I like it. IMO, that's what the method should do anyway, for better performance and service to the users. If I'm adding indexes, that doesn't mean I want a whole merge process to kick off. If I want that, I can call maybeMerge or optimize afterwards. Basically, what I would like to add (and I'm not sure it belongs to this issue) is a "super fast" addIndexes method, something like registerIndexes, which doesn't even traverses the posting lists, removes deleted docs etc. - simply registering the new segments in the Directory. If needed - do a bulk copy of the files and update segments*. Simple as that. Maybe it does fit in that issue, as part of the general "house cleaning"? I will look more closely into supporting MP + MS w/ addIndexes(readers). Can't promise anything as I learn the code as I go .
        Hide
        Michael McCandless added a comment -

        I agree, addIndexes should be minimal in the work it does...

        But bulk copy of the files isn't really possible for addIndexes(IR...) in general, since the readers can be arbitrary (eg FilterIndexReader).

        Show
        Michael McCandless added a comment - I agree, addIndexes should be minimal in the work it does... But bulk copy of the files isn't really possible for addIndexes(IR...) in general, since the readers can be arbitrary (eg FilterIndexReader).
        Hide
        Shai Erera added a comment -

        Ok. But since addIndexes(IR) is for IR extensions only, I think the number of people tha will be limited by it is very low.

        But, why wouldn't they be able to use the Directory... version of the method? Since it's a bulk copy, we don't need IR methods. Maybe just call dir.copyTo or something of that sort? The method will only be asked to copy files (in case they exist elsewhere). I was thinking of introducing just a Directoy version of such method.

        Basically, if you use NoMP and call addIndexesNoOptimize today, you get half of what I want, as only resolveExternals will be called. What I want is for the resolveExternals to be even faster, plain and shallow "resolution".

        Show
        Shai Erera added a comment - Ok. But since addIndexes(IR) is for IR extensions only, I think the number of people tha will be limited by it is very low. But, why wouldn't they be able to use the Directory... version of the method? Since it's a bulk copy, we don't need IR methods. Maybe just call dir.copyTo or something of that sort? The method will only be asked to copy files (in case they exist elsewhere). I was thinking of introducing just a Directoy version of such method. Basically, if you use NoMP and call addIndexesNoOptimize today, you get half of what I want, as only resolveExternals will be called. What I want is for the resolveExternals to be even faster, plain and shallow "resolution".
        Hide
        Michael McCandless added a comment -

        But, why wouldn't they be able to use the Directory... version of the method?

        Adding indexes using FilterIndexReader is useful – eg look @ how the multi-pass index splitter tool works.

        What I want is for the resolveExternals to be even faster, plain and shallow "resolution".

        For addIndexes(Directory), assuming the codecs are identical (the "write" codec equals the codec used to write the external segment), and assuming the doc stores of the external segment are private to it, I think we should be able to do a straight file-level copy, but renaming the segment in the process?

        Show
        Michael McCandless added a comment - But, why wouldn't they be able to use the Directory... version of the method? Adding indexes using FilterIndexReader is useful – eg look @ how the multi-pass index splitter tool works. What I want is for the resolveExternals to be even faster, plain and shallow "resolution". For addIndexes(Directory), assuming the codecs are identical (the "write" codec equals the codec used to write the external segment), and assuming the doc stores of the external segment are private to it, I think we should be able to do a straight file-level copy, but renaming the segment in the process?
        Hide
        Shai Erera added a comment -

        Adding indexes using FilterIndexReader is useful

        I'm not against that Mike. addIndexes should allow for both IndexReader and Directory. It's the registerIndexes (or whatever name we come up with) which should work with Directory only, and then, even if the app calls addIndexes with its own custom IR, it can still call registerIndexes w/ the Directory only, to do that fast copy/registration. Since no IR method will be involved in the process.

        So let's not confuse the two - addIndexes will exist and work as they are today. registerIndexes will be a new one.

        assuming the codecs are identical (the "write" codec equals the codec used to write the external segment), and assuming the doc stores of the external segment are private to it

        Right. Thanks for pointing that out, as it will become an important NOTE in the documentation. This method (registerIndexes) is definitely for advanced users, that have to know exactly what's in the foreign indexes. For example, I need this because I'm building several indexes on several nodes and then I want to add them to a central/master one. I know they don't have deletions, and each is already optimized. Therefore traversing the posting lists (as fast as it would be) is completely unnecessary.

        but renaming the segment in the process?

        Sure! I think we should really 'register' them in the Directory, as if they are the newly flushed segments. I'm sure you have a general idea on how this can be done? Assuming through SegmentInfos or something?

        Show
        Shai Erera added a comment - Adding indexes using FilterIndexReader is useful I'm not against that Mike. addIndexes should allow for both IndexReader and Directory. It's the registerIndexes (or whatever name we come up with) which should work with Directory only, and then, even if the app calls addIndexes with its own custom IR, it can still call registerIndexes w/ the Directory only, to do that fast copy/registration. Since no IR method will be involved in the process. So let's not confuse the two - addIndexes will exist and work as they are today. registerIndexes will be a new one. assuming the codecs are identical (the "write" codec equals the codec used to write the external segment), and assuming the doc stores of the external segment are private to it Right. Thanks for pointing that out, as it will become an important NOTE in the documentation. This method (registerIndexes) is definitely for advanced users, that have to know exactly what's in the foreign indexes. For example, I need this because I'm building several indexes on several nodes and then I want to add them to a central/master one. I know they don't have deletions, and each is already optimized. Therefore traversing the posting lists (as fast as it would be) is completely unnecessary. but renaming the segment in the process? Sure! I think we should really 'register' them in the Directory, as if they are the newly flushed segments. I'm sure you have a general idea on how this can be done? Assuming through SegmentInfos or something?
        Hide
        Shai Erera added a comment -

        While changing addIndexes(reader), I've noticed it first obtains read lock and then calls startTransaction(true). In between it calls flush + optimize, which I've removed (as we no longer want to do that). When I ran the tests, TestIndexWriter.testAddIndexesWithThreads failed on the assert in startTransaction about numDocsInRam != 0. That's expected as I no longer call flush. The failure does not occur always.

        In addIndexes(Dir) flush is called before startTransaction. But it makes sense to do it there, as the local segments are also merged. In the new addIndexes(reader) they won't and so I wonder if:

        • I shouldn't call startTransaction at all, or
        • I should, but also call flush before?
        Show
        Shai Erera added a comment - While changing addIndexes(reader), I've noticed it first obtains read lock and then calls startTransaction(true). In between it calls flush + optimize, which I've removed (as we no longer want to do that). When I ran the tests, TestIndexWriter.testAddIndexesWithThreads failed on the assert in startTransaction about numDocsInRam != 0. That's expected as I no longer call flush. The failure does not occur always. In addIndexes(Dir) flush is called before startTransaction. But it makes sense to do it there, as the local segments are also merged. In the new addIndexes(reader) they won't and so I wonder if: I shouldn't call startTransaction at all, or I should, but also call flush before?
        Hide
        Shai Erera added a comment -

        Patch handles the changes for 3x:

        • addIndexesNoOptimize deprecated - new addIndexes(Directory...) instead
        • CHANGES updates
        • addIndexes does not do optimize before
        • TestAddIndexesNoOptimize renamed to TestAddIndexes
        • Changed calls to addIndexesNoOpt to use addIndexes(Dir...) (except for backwards tests)
        • Changed textual references to addIndexesNoOpt.

        You should "svn mv lucene/src/test/org/apache/lucene/index/TestAddIndexesNoOptimize.java lucene/src/test/org/apache/lucene/index/TestAddIndexes.java" before you apply the patch.

        The patch is much smaller than it looks - it's the rename of TestAddIndexesNoOpt that takes a lot of space.

        As I've mentioned before, I'm not sure about addIndexes calling startTransaction w/o flushing. Even though the tests pass, this seems wrong. So I'd appreciate a review.

        Show
        Shai Erera added a comment - Patch handles the changes for 3x: addIndexesNoOptimize deprecated - new addIndexes(Directory...) instead CHANGES updates addIndexes does not do optimize before TestAddIndexesNoOptimize renamed to TestAddIndexes Changed calls to addIndexesNoOpt to use addIndexes(Dir...) (except for backwards tests) Changed textual references to addIndexesNoOpt. You should "svn mv lucene/src/test/org/apache/lucene/index/TestAddIndexesNoOptimize.java lucene/src/test/org/apache/lucene/index/TestAddIndexes.java" before you apply the patch. The patch is much smaller than it looks - it's the rename of TestAddIndexesNoOpt that takes a lot of space. As I've mentioned before, I'm not sure about addIndexes calling startTransaction w/o flushing. Even though the tests pass, this seems wrong. So I'd appreciate a review.
        Hide
        Michael McCandless added a comment -

        I think you should still call flush, and still start/commitTransaction – addIndexes is "all or nothing", which is why we have those transaction methods. Ie, on exception, the rollbackTransaction puts the original segments back.

        Show
        Michael McCandless added a comment - I think you should still call flush, and still start/commitTransaction – addIndexes is "all or nothing", which is why we have those transaction methods. Ie, on exception, the rollbackTransaction puts the original segments back.
        Hide
        Michael McCandless added a comment -

        Patch looks good Shai! Only a small typo in CHANGES (unles -> unless).

        Show
        Michael McCandless added a comment - Patch looks good Shai! Only a small typo in CHANGES (unles -> unless).
        Hide
        Shai Erera added a comment -

        I see. I understand why it's called in addIndexes(Dir), because the local segments are also touched. But now in the Reader version, they aren't. So it looked odd to me that we flush whatever is in RAM. I think you said once that addIndexes should have done the merge outside, adding the new segment when it's done?

        But if you think that flushing the RAM, even though its content is touched, is ok then I'll change it.

        Show
        Shai Erera added a comment - I see. I understand why it's called in addIndexes(Dir), because the local segments are also touched. But now in the Reader version, they aren't. So it looked odd to me that we flush whatever is in RAM. I think you said once that addIndexes should have done the merge outside, adding the new segment when it's done? But if you think that flushing the RAM, even though its content is touched, is ok then I'll change it.
        Hide
        Michael McCandless added a comment -

        I understand why it's called in addIndexes(Dir), because the local segments are also touched. But now in the Reader version, they aren't. So it looked odd to me that we flush whatever is in RAM.

        Yeah maybe we should no longer flush (but still call start/commitTransaction). I think there may've been a reason to flush first (besides that we were also merging local segments)... but I can't remember it. If you comment out that assert (and the corresponding assert for deletions) do any tests fail?

        I think you said once that addIndexes should have done the merge outside, adding the new segment when it's done?

        Yes, I would love to fix this – it'd mean we would not need the start/commit/rollbackTransaction code.

        Ie, we play a dangerous game now, where addIndexes is allowed to muck with the in-memory SegmentInfos before it's complete. It'd be better if all merging happened outside of its SegmentInfos, and only when addIndexes finished, it'd atomically commit to SegmentInfos.

        This would then allow commit() to run immediately, not having to wait for any running addIndexes to finish first. And we would not need to block add/updateDocument nor deleteDocuments while addIndexes is running.

        So, actually, I think in addIndexes(IR...) you should not use the transaction logic at all? Just do the merge externally & commit in the end? (And try not flushing as well.).

        Show
        Michael McCandless added a comment - I understand why it's called in addIndexes(Dir), because the local segments are also touched. But now in the Reader version, they aren't. So it looked odd to me that we flush whatever is in RAM. Yeah maybe we should no longer flush (but still call start/commitTransaction). I think there may've been a reason to flush first (besides that we were also merging local segments)... but I can't remember it. If you comment out that assert (and the corresponding assert for deletions) do any tests fail? I think you said once that addIndexes should have done the merge outside, adding the new segment when it's done? Yes, I would love to fix this – it'd mean we would not need the start/commit/rollbackTransaction code. Ie, we play a dangerous game now, where addIndexes is allowed to muck with the in-memory SegmentInfos before it's complete. It'd be better if all merging happened outside of its SegmentInfos, and only when addIndexes finished, it'd atomically commit to SegmentInfos. This would then allow commit() to run immediately, not having to wait for any running addIndexes to finish first. And we would not need to block add/updateDocument nor deleteDocuments while addIndexes is running. So, actually, I think in addIndexes(IR...) you should not use the transaction logic at all? Just do the merge externally & commit in the end? (And try not flushing as well.).
        Hide
        Shai Erera added a comment -

        Ha , I knew this will get more complicated and interesting ... So basically, it feels to me that if we'd have registerIndexes, we could in addIndexes merge outside IW and then call register?

        So far, tests pass w/ startTransaction. But that test is multi-threaded so it may be a concurrency issue. I'll try to do the addIndexes outside IW and then commit the new segment. If that will be straightforward, then I think I'll understand better how to develop registerIndexes.

        Show
        Shai Erera added a comment - Ha , I knew this will get more complicated and interesting ... So basically, it feels to me that if we'd have registerIndexes, we could in addIndexes merge outside IW and then call register? So far, tests pass w/ startTransaction. But that test is multi-threaded so it may be a concurrency issue. I'll try to do the addIndexes outside IW and then commit the new segment. If that will be straightforward, then I think I'll understand better how to develop registerIndexes.
        Hide
        Michael McCandless added a comment -

        Ha , I knew this will get more complicated and interesting ..

        It always does!

        So basically, it feels to me that if we'd have registerIndexes, we could in addIndexes merge outside IW and then call register?

        Hmm but register will be an external API for copying over segments in a foreign directory, right? (And segment must be renamed).

        Vs these segments which will be in our directory already, with the right segment name, and just need to be committed to the segmentInfos?

        So far, tests pass w/ startTransaction.

        You mean w/o the flush?

        Show
        Michael McCandless added a comment - Ha , I knew this will get more complicated and interesting .. It always does! So basically, it feels to me that if we'd have registerIndexes, we could in addIndexes merge outside IW and then call register? Hmm but register will be an external API for copying over segments in a foreign directory, right? (And segment must be renamed). Vs these segments which will be in our directory already, with the right segment name, and just need to be committed to the segmentInfos? So far, tests pass w/ startTransaction. You mean w/o the flush?
        Hide
        Shai Erera added a comment -

        You mean w/o the flush?

        Yes. The start/commit transaction looks like that:

        startTransaction(false);
        
        try {
          mergedName = newSegmentName();
          merger = new SegmentMerger(this, mergedName, null);
        
          for (IndexReader reader : readers)      // add new indexes
            merger.add(reader);
                
          int docCount = merger.merge();                // merge 'em
                
          synchronized(this) {
            info = new SegmentInfo(mergedName, docCount, directory, false, true,
                          -1, null, false, merger.hasProx());
            setDiagnostics(info, "addIndexes(IndexReader...)");
            segmentInfos.add(info);
          }
                
          // Notify DocumentsWriter that the flushed count just increased
          docWriter.updateFlushedDocCount(docCount);
                
          success = true;
        } finally {
          if (!success) {
            rollbackTransaction();
          } else {
            commitTransaction();
          }
        }
        
        • A new segment name is generated
        • All readers but the current one are merged
        • The new SI is added to the writer's SIs
        • DocWriter's updateFlushedDocCount is updated
        • The transaction is committed or rolled back if there was an error.

        So this looks like it already does the merge "on the side" and when it's done the new segment is registered?

        Show
        Shai Erera added a comment - You mean w/o the flush? Yes. The start/commit transaction looks like that: startTransaction( false ); try { mergedName = newSegmentName(); merger = new SegmentMerger( this , mergedName, null ); for (IndexReader reader : readers) // add new indexes merger.add(reader); int docCount = merger.merge(); // merge 'em synchronized ( this ) { info = new SegmentInfo(mergedName, docCount, directory, false , true , -1, null , false , merger.hasProx()); setDiagnostics(info, "addIndexes(IndexReader...)" ); segmentInfos.add(info); } // Notify DocumentsWriter that the flushed count just increased docWriter.updateFlushedDocCount(docCount); success = true ; } finally { if (!success) { rollbackTransaction(); } else { commitTransaction(); } } A new segment name is generated All readers but the current one are merged The new SI is added to the writer's SIs DocWriter's updateFlushedDocCount is updated The transaction is committed or rolled back if there was an error. So this looks like it already does the merge "on the side" and when it's done the new segment is registered?
        Hide
        Shai Erera added a comment -

        In fact, I've create newAddIndexes (just for the review) which works like that:

          public void newAddIndexes(IndexReader... readers) throws CorruptIndexException, IOException {
        
            ensureOpen();
        
            try {
              String mergedName = newSegmentName();
              SegmentMerger merger = new SegmentMerger(this, mergedName, null);
              
              for (IndexReader reader : readers)      // add new indexes
                merger.add(reader);
              
              int docCount = merger.merge();                // merge 'em
              
              SegmentInfo info = null;
              synchronized(this) {
                info = new SegmentInfo(mergedName, docCount, directory, false, true,
                    -1, null, false, merger.hasProx());
                setDiagnostics(info, "addIndexes(IndexReader...)");
                segmentInfos.add(info);
              }
              
              // Notify DocumentsWriter that the flushed count just increased
              docWriter.updateFlushedDocCount(docCount);
              
              // Now create the compound file if needed
              if (mergePolicy instanceof LogMergePolicy && getUseCompoundFile()) {
        
                List<String> files = null;
        
                synchronized(this) {
                  // Must incRef our files so that if another thread
                  // is running merge/optimize, it doesn't delete our
                  // segment's files before we have a chance to
                  // finish making the compound file.
                  if (segmentInfos.contains(info)) {
                    files = info.files();
                    deleter.incRef(files);
                  }
                }
        
                if (files != null) {
                  try {
                    merger.createCompoundFile(mergedName + ".cfs");
                    synchronized(this) {
                      info.setUseCompoundFile(true);
                    }
                  } finally {
                    deleter.decRef(files);
                  }
                }
              }
            } catch (OutOfMemoryError oom) {
              handleOOM(oom, "addIndexes(IndexReader...)");
            } finally {
              if (docWriter != null) {
                docWriter.resumeAllThreads();
              }
            }
          }
        

        Question: if we've just added the new SI to segmentInfos, why do we sync on this and check if it exists (when we create the compound file)? Is it because there could be a running merge which will merge it into a new segment before we reach that point?

        What do you think? Is that what you had in mind about merging on the side and committing in the end?

        Show
        Shai Erera added a comment - In fact, I've create newAddIndexes (just for the review) which works like that: public void newAddIndexes(IndexReader... readers) throws CorruptIndexException, IOException { ensureOpen(); try { String mergedName = newSegmentName(); SegmentMerger merger = new SegmentMerger( this , mergedName, null ); for (IndexReader reader : readers) // add new indexes merger.add(reader); int docCount = merger.merge(); // merge 'em SegmentInfo info = null ; synchronized ( this ) { info = new SegmentInfo(mergedName, docCount, directory, false , true , -1, null , false , merger.hasProx()); setDiagnostics(info, "addIndexes(IndexReader...)" ); segmentInfos.add(info); } // Notify DocumentsWriter that the flushed count just increased docWriter.updateFlushedDocCount(docCount); // Now create the compound file if needed if (mergePolicy instanceof LogMergePolicy && getUseCompoundFile()) { List< String > files = null ; synchronized ( this ) { // Must incRef our files so that if another thread // is running merge/optimize, it doesn't delete our // segment's files before we have a chance to // finish making the compound file. if (segmentInfos.contains(info)) { files = info.files(); deleter.incRef(files); } } if (files != null ) { try { merger.createCompoundFile(mergedName + ".cfs" ); synchronized ( this ) { info.setUseCompoundFile( true ); } } finally { deleter.decRef(files); } } } } catch (OutOfMemoryError oom) { handleOOM(oom, "addIndexes(IndexReader...)" ); } finally { if (docWriter != null ) { docWriter.resumeAllThreads(); } } } Question: if we've just added the new SI to segmentInfos, why do we sync on this and check if it exists (when we create the compound file)? Is it because there could be a running merge which will merge it into a new segment before we reach that point? What do you think? Is that what you had in mind about merging on the side and committing in the end?
        Hide
        Michael McCandless added a comment -

        Question: if we've just added the new SI to segmentInfos, why do we sync on this and check if it exists (when we create the compound file)? Is it because there could be a running merge which will merge it into a new segment before we reach that point?

        Yes, exactly.

        What do you think? Is that what you had in mind about merging on the side and committing in the end?

        Yup! This looks great.... though I think you should move the docWriter.updateFlushedDocCount into the sync above it? We didn't have to do this before because we blocked all add/updateDocument calls.

        Also, you shouldn't call docWriter.resumeAllThreads (you didn't pause them).

        So this change is a great step forward in concurrency of addIndexes(IndexReader...)!

        Show
        Michael McCandless added a comment - Question: if we've just added the new SI to segmentInfos, why do we sync on this and check if it exists (when we create the compound file)? Is it because there could be a running merge which will merge it into a new segment before we reach that point? Yes, exactly. What do you think? Is that what you had in mind about merging on the side and committing in the end? Yup! This looks great.... though I think you should move the docWriter.updateFlushedDocCount into the sync above it? We didn't have to do this before because we blocked all add/updateDocument calls. Also, you shouldn't call docWriter.resumeAllThreads (you didn't pause them). So this change is a great step forward in concurrency of addIndexes(IndexReader...)!
        Hide
        Shai Erera added a comment -

        Also, you shouldn't call docWriter.resumeAllThreads (you didn't pause them).

        Oops, missed that . Thanks !

        I'll replace addIndexes w/ this code and run tests to check how it flies.

        Show
        Shai Erera added a comment - Also, you shouldn't call docWriter.resumeAllThreads (you didn't pause them). Oops, missed that . Thanks ! I'll replace addIndexes w/ this code and run tests to check how it flies.
        Hide
        Shai Erera added a comment -

        I've looked into implementing registerIndexes, and that's the approach I'd like to take:

        • For each incoming Directory, read its SegmentInfos
        • For each SegmentInfo:
          • Generate a new segment name
          • List its files
          • Copy them from incoming Dir to local Dir, w/ the new segment name
          • Add such SI to the local IW segmentInfos
          • Update DW docCount, like it's done in the addIndexes* methods.

        Few things:

        1. Does that sound reasonable? Am I missing something?
        2. Directory exposes a copyTo(Dir, Collection) which I thought to use. But the files are copied to the target Dir w/ their current name - while I need to copy them over w/ their new name.
          • Adding rename to Dir feels wrong and dangerous to me
          • Adding copyFile(Dir, String old, String new) seems ok
          • Adding a variant of copyTo which accepts a Collection of the new names - the src and new should align. This also seems ok to me.

        I'd like to use Directory for the copy, since impls of Dir may do the copy very efficiently (i.e. FSDir vs. RAMDir) and I don't want to use IndexInput/Output for that.

        Do you know of another way I can achieve that? I only want to copy the actual segment files, w/o .gen and segments_N, so calling SI.files() seems ok?

        Another question that popped into my head was about consistency of the incoming Dirs vs. the local one, w.r.t. to CFS files - should I worry about that? I think not because today one can create an index w/ CFS and then turn it off and some segments will be compound and others not?

        Show
        Shai Erera added a comment - I've looked into implementing registerIndexes, and that's the approach I'd like to take: For each incoming Directory, read its SegmentInfos For each SegmentInfo: Generate a new segment name List its files Copy them from incoming Dir to local Dir, w/ the new segment name Add such SI to the local IW segmentInfos Update DW docCount, like it's done in the addIndexes* methods. Few things: Does that sound reasonable? Am I missing something? Directory exposes a copyTo(Dir, Collection) which I thought to use. But the files are copied to the target Dir w/ their current name - while I need to copy them over w/ their new name. Adding rename to Dir feels wrong and dangerous to me Adding copyFile(Dir, String old, String new) seems ok Adding a variant of copyTo which accepts a Collection of the new names - the src and new should align. This also seems ok to me. I'd like to use Directory for the copy, since impls of Dir may do the copy very efficiently (i.e. FSDir vs. RAMDir) and I don't want to use IndexInput/Output for that. Do you know of another way I can achieve that? I only want to copy the actual segment files, w/o .gen and segments_N, so calling SI.files() seems ok? Another question that popped into my head was about consistency of the incoming Dirs vs. the local one, w.r.t. to CFS files - should I worry about that? I think not because today one can create an index w/ CFS and then turn it off and some segments will be compound and others not?
        Hide
        Michael McCandless added a comment -

        I've looked into implementing registerIndexes, and that's the approach I'd like to take:

        This looks good.

        Though if the src segments share docStores, you can't do a simple
        copy (I think you have to fallback to the resolveExternalSegments
        approach for such segments).

        Does that sound reasonable? Am I missing something?

        I think this should work!

        If the src segments are an older index rev, I think you are still OK.
        They will just remain "old" on copy, and merge will eventually migrate
        them forward.

        For trunk... you should note in the jdocs that no codec conversion
        takes place. So the CodecProvider used in IW (and later used to read
        this index) must know how to provide the codec used by the src
        segments.

        Directory exposes a copyTo(Dir, Collection) which I thought to use. But the files are copied to the target Dir w/ their current name - while I need to copy them over w/ their new name.
        Adding rename to Dir feels wrong and dangerous to me
        Adding copyFile(Dir, String old, String new) seems ok
        Adding a variant of copyTo which accepts a Collection of the new names - the src and new should align. This also seems ok to me.
        I'd like to use Directory for the copy, since impls of Dir may do the copy very efficiently (i.e. FSDir vs. RAMDir) and I don't want to use IndexInput/Output for that.

        Do you know of another way I can achieve that? I only want to copy the actual segment files, w/o .gen and segments_N, so calling SI.files() seems ok?

        SI.files() should be fine.

        I think falling back to copyFile is best? Then copyTo could use it.

        Another question that popped into my head was about consistency of the
        incoming Dirs vs. the local one, w.r.t. to CFS files - should I worry
        about that? I think not because today one can create an index w/ CFS
        and then turn it off and some segments will be compound and others
        not?

        I think that's fine, but we should advertise in the jdocs.

        Show
        Michael McCandless added a comment - I've looked into implementing registerIndexes, and that's the approach I'd like to take: This looks good. Though if the src segments share docStores, you can't do a simple copy (I think you have to fallback to the resolveExternalSegments approach for such segments). Does that sound reasonable? Am I missing something? I think this should work! If the src segments are an older index rev, I think you are still OK. They will just remain "old" on copy, and merge will eventually migrate them forward. For trunk... you should note in the jdocs that no codec conversion takes place. So the CodecProvider used in IW (and later used to read this index) must know how to provide the codec used by the src segments. Directory exposes a copyTo(Dir, Collection) which I thought to use. But the files are copied to the target Dir w/ their current name - while I need to copy them over w/ their new name. Adding rename to Dir feels wrong and dangerous to me Adding copyFile(Dir, String old, String new) seems ok Adding a variant of copyTo which accepts a Collection of the new names - the src and new should align. This also seems ok to me. I'd like to use Directory for the copy, since impls of Dir may do the copy very efficiently (i.e. FSDir vs. RAMDir) and I don't want to use IndexInput/Output for that. Do you know of another way I can achieve that? I only want to copy the actual segment files, w/o .gen and segments_N, so calling SI.files() seems ok? SI.files() should be fine. I think falling back to copyFile is best? Then copyTo could use it. Another question that popped into my head was about consistency of the incoming Dirs vs. the local one, w.r.t. to CFS files - should I worry about that? I think not because today one can create an index w/ CFS and then turn it off and some segments will be compound and others not? I think that's fine, but we should advertise in the jdocs.
        Hide
        Shai Erera added a comment -

        So it sounds like addIndexes should really be that registerIndexes. Specifically it should do a quick and dirty copy of all segments that don't share doc stores, and then resolveExternals those that do? Maybe we can get rid of those transactions and not block add/update/delete/commit/addIndexes attempts anymore?

        Usually, I expect this to be a win-win. In cases where you add Directories w/ plenty of segments that share doc stores it will be slower, because we won't utilize MP and MS. But this can be improved in the future as well by e.g. just taking care of the shared doc stores, and don't remove deleted document entries etc.

        But .. it will prevent (in some cases) the use of PayloadProcessorProvider ... hmm. So it seems we do need a separate registerIndexes for the really quick & dirty addIndexes operation.

        BTW, Directory.copyTo* should be replaced w/ Directory.copy(Dir, File, File), for a couple of reasons:

        • There's no reason to believe the dest file should be named the same as the source file
        • The method is not entirely safe - only jdocs protect the user from doing really stupid thing such as overwriting the segments* files.
        • I don't see a proper usecase for that method, other than copying a Directory into an empty one. At least, other use cases are very dangerous.

        Instead, the user should do this logic outside - call dir.listAll(), w/ and w/o FilenameFilter and copy the files of interest, and be allowed to rename them in the process.

        Show
        Shai Erera added a comment - So it sounds like addIndexes should really be that registerIndexes. Specifically it should do a quick and dirty copy of all segments that don't share doc stores, and then resolveExternals those that do? Maybe we can get rid of those transactions and not block add/update/delete/commit/addIndexes attempts anymore? Usually, I expect this to be a win-win. In cases where you add Directories w/ plenty of segments that share doc stores it will be slower, because we won't utilize MP and MS. But this can be improved in the future as well by e.g. just taking care of the shared doc stores, and don't remove deleted document entries etc. But .. it will prevent (in some cases) the use of PayloadProcessorProvider ... hmm. So it seems we do need a separate registerIndexes for the really quick & dirty addIndexes operation. BTW, Directory.copyTo* should be replaced w/ Directory.copy(Dir, File, File), for a couple of reasons: There's no reason to believe the dest file should be named the same as the source file The method is not entirely safe - only jdocs protect the user from doing really stupid thing such as overwriting the segments* files. I don't see a proper usecase for that method, other than copying a Directory into an empty one. At least, other use cases are very dangerous. Instead, the user should do this logic outside - call dir.listAll(), w/ and w/o FilenameFilter and copy the files of interest, and be allowed to rename them in the process.
        Hide
        Shai Erera added a comment -

        So ... after I slept over it, I don't think I can easily let go of the so-near victory – having addIndexes not blocking anything, done on the side, be as efficient as possible and give up a chance to do some serious code cleanup . So I'd like to propose the following, we can discuss them one by one, but they all aim at the same thing:

        1. addIndexes(Directory ...) will be a quick & dirty copy of segments. No deleted docs removal in the process, no merges.
          • It's clear how this can be done for segments that don't share doc stores.
          • What isn't clear to me is why can't this work for segments that do share doc stores - can't we copy all of them at once, and add to segmentInfos when the segments + their doc store were copied? So, if I have 5 segments, w/ the following doc stores: 1-2, 3 and 4-5 (3 stores for 5 segments), can't I copy 1+2+store, 3, 4+5+store? Wouldn't that work? I'll give it a shot.
        2. PayloadProcessorProvider won't be used in the process. If you want, you can set it on the source writer, and call optimize(). Performance-wise it will be nearly identical - instead of processing the postings + payloads during addIndexes, you'll do that during optimize() (all segments will be processed anyway) and addIndexes will do a bulk IO copy, which on most modern machines is really cheap.
          • Further more, you will end up w/ just one segment, which means it can be copied at once for sure.
          • It will also simplify PPP – no need for DirPP anymore. PPP would get a Term and return a PP for that term, as it will always work on just one Directory.
          • People can still use it with the target IW if they want, but not through addIndexes.
        3. Apps can call maybeMerge or optimize() following addIndexes, if they want to.

        What remains is addIndexes(IndexReader...) and I'm not sure why this cannot be removed. In the back of my head I remember a discussion about it once, so I guess there is a good reason. But at least from what I see now, and armed w/ my use cases only, it seems like even if you use an extension of IndexReader you should still be able to do a bulk copy? Hmm ... unless if your extension assumes different postings structure or something like that, which the regular SegmentReader won't know about – then during addIndexes those postings are converted.

        But, how common is this? And wouldn't it be better if such indexes are migrated beforehand? I mean, anyway after addIndexes those postings won't retain the custom IndexReader-assuming format? Or is there another reason?

        If we go with that, then SegmentMerger can be simplified as well, assuming only SegmentReader?

        What do you think?

        Show
        Shai Erera added a comment - So ... after I slept over it, I don't think I can easily let go of the so-near victory – having addIndexes not blocking anything, done on the side, be as efficient as possible and give up a chance to do some serious code cleanup . So I'd like to propose the following, we can discuss them one by one, but they all aim at the same thing: addIndexes(Directory ...) will be a quick & dirty copy of segments. No deleted docs removal in the process, no merges. It's clear how this can be done for segments that don't share doc stores. What isn't clear to me is why can't this work for segments that do share doc stores - can't we copy all of them at once, and add to segmentInfos when the segments + their doc store were copied? So, if I have 5 segments, w/ the following doc stores: 1-2, 3 and 4-5 (3 stores for 5 segments), can't I copy 1+2+store, 3, 4+5+store? Wouldn't that work? I'll give it a shot. PayloadProcessorProvider won't be used in the process. If you want, you can set it on the source writer, and call optimize(). Performance-wise it will be nearly identical - instead of processing the postings + payloads during addIndexes, you'll do that during optimize() (all segments will be processed anyway) and addIndexes will do a bulk IO copy, which on most modern machines is really cheap. Further more, you will end up w/ just one segment, which means it can be copied at once for sure. It will also simplify PPP – no need for DirPP anymore. PPP would get a Term and return a PP for that term, as it will always work on just one Directory. People can still use it with the target IW if they want, but not through addIndexes. Apps can call maybeMerge or optimize() following addIndexes, if they want to. What remains is addIndexes(IndexReader...) and I'm not sure why this cannot be removed. In the back of my head I remember a discussion about it once, so I guess there is a good reason. But at least from what I see now, and armed w/ my use cases only, it seems like even if you use an extension of IndexReader you should still be able to do a bulk copy? Hmm ... unless if your extension assumes different postings structure or something like that, which the regular SegmentReader won't know about – then during addIndexes those postings are converted. But, how common is this? And wouldn't it be better if such indexes are migrated beforehand? I mean, anyway after addIndexes those postings won't retain the custom IndexReader-assuming format? Or is there another reason? If we go with that, then SegmentMerger can be simplified as well, assuming only SegmentReader? What do you think?
        Hide
        Michael McCandless added a comment -

        What isn't clear to me is why can't this work for segments that do share doc stores

        You are right!

        If we copy over the doc stores, also renaming them, and fixup the incoming SegmentInto to reference the newly named one, this should work fine!

        PayloadProcessorProvider won't be used in the process.

        This (and not needing DirPP anymore) is a great simplification.

        What remains is addIndexes(IndexReader...) and I'm not sure why this cannot be removed.

        I think we still need it... look at how multi-pass index splitter (contrib/misc) works.

        Show
        Michael McCandless added a comment - What isn't clear to me is why can't this work for segments that do share doc stores You are right! If we copy over the doc stores, also renaming them, and fixup the incoming SegmentInto to reference the newly named one, this should work fine! PayloadProcessorProvider won't be used in the process. This (and not needing DirPP anymore) is a great simplification. What remains is addIndexes(IndexReader...) and I'm not sure why this cannot be removed. I think we still need it... look at how multi-pass index splitter (contrib/misc) works.
        Hide
        Shai Erera added a comment -

        look at how multi-pass index splitter (contrib/misc) works.

        I see ...

        I think this can be achieved by also deleting docs that fall outside the split range and calling optimize() / expungeDeletes()? So, you copy the index once (using one of the copying addIndexes methods), delete the docs that you don't care about and optimize/expunge. Then copy the index again and repeat the process, w/ a different range of ids. In fact, that's more or less what the method does - only it calls addIndexes, to copy into an existing/empty Directory.

        So I think it can be changed to not call addIndexes? Only problem is that now the method adds the docs into the target Directory directly, while in the other solution it will need to create a Directory on the side w/ the range of requested IDs and then copy that one into the target Dir?

        But I'm not sure that's worth it to have addIndexes(IndexReader...) and the relevant code in SM which handles the non-SegmentReader readers? Of course, this is just one scenario, but if that's our justifying case, then I'm not sure about how justifying it is.

        Show
        Shai Erera added a comment - look at how multi-pass index splitter (contrib/misc) works. I see ... I think this can be achieved by also deleting docs that fall outside the split range and calling optimize() / expungeDeletes()? So, you copy the index once (using one of the copying addIndexes methods), delete the docs that you don't care about and optimize/expunge. Then copy the index again and repeat the process, w/ a different range of ids. In fact, that's more or less what the method does - only it calls addIndexes, to copy into an existing/empty Directory. So I think it can be changed to not call addIndexes? Only problem is that now the method adds the docs into the target Directory directly, while in the other solution it will need to create a Directory on the side w/ the range of requested IDs and then copy that one into the target Dir? But I'm not sure that's worth it to have addIndexes(IndexReader...) and the relevant code in SM which handles the non-SegmentReader readers? Of course, this is just one scenario, but if that's our justifying case, then I'm not sure about how justifying it is.
        Hide
        Andrzej Bialecki added a comment - - edited

        FYI, I'm working on a different version of IndexSplitter that uses the logic in SegmentMerger directly, without going through IW.addIndexes(FilterIndexReader).

        However, there are other applications for which this API is crucial, e.g. LUCENE-1812 or IndexSorter (in Nutch) - in short, any client apps that want to merge-in index data that does not correspond 1:1 to a Directory. For this reason I think the pair of IndexWriter.addIndexes(IndexReader...) and FilterIndexReader abstraction is extremely useful and that IndexWriter.addIndexes(Directory...) is not a sufficient replacement.

        (edit: unless there is a better user-level API based on the flex producers/consumers...)

        Show
        Andrzej Bialecki added a comment - - edited FYI, I'm working on a different version of IndexSplitter that uses the logic in SegmentMerger directly, without going through IW.addIndexes(FilterIndexReader). However, there are other applications for which this API is crucial, e.g. LUCENE-1812 or IndexSorter (in Nutch) - in short, any client apps that want to merge-in index data that does not correspond 1:1 to a Directory. For this reason I think the pair of IndexWriter.addIndexes(IndexReader...) and FilterIndexReader abstraction is extremely useful and that IndexWriter.addIndexes(Directory...) is not a sufficient replacement. (edit: unless there is a better user-level API based on the flex producers/consumers...)
        Hide
        Shai Erera added a comment -

        any client apps that want to merge-in index data that does not correspond 1:1 to a Directory

        I understand that. But when you call addIndexes w/ such IndexReaders, all they do is read the postings. Those are written down using the logic of the target IndexWriter. So I wonder how important is it for addIndexes to be in place, rather than say rewriting those indexes before they are added? I mean, all that addIndexes will do is call SegmentMerger and iterate on the readers and segments, merging the posting lists ...

        I don't object to that API. But, SM is used extensively, and is more of a main-path code, while addIndexes(IndexReader) is something only few out there use. Yet it affects everyone else who reads SM code, as well as those of us who are confused about which method to call (Reader or Directory) ... It almost feels like such operation - the relevant code from SM which handles non-SegmentReaders, should be extracted to a utility or something. But if I'm the only one that's bothered by it, then so be it. I can take care of the rest now, and resolve that one later.

        Show
        Shai Erera added a comment - any client apps that want to merge-in index data that does not correspond 1:1 to a Directory I understand that. But when you call addIndexes w/ such IndexReaders, all they do is read the postings. Those are written down using the logic of the target IndexWriter. So I wonder how important is it for addIndexes to be in place, rather than say rewriting those indexes before they are added? I mean, all that addIndexes will do is call SegmentMerger and iterate on the readers and segments, merging the posting lists ... I don't object to that API. But, SM is used extensively, and is more of a main-path code, while addIndexes(IndexReader) is something only few out there use. Yet it affects everyone else who reads SM code, as well as those of us who are confused about which method to call (Reader or Directory) ... It almost feels like such operation - the relevant code from SM which handles non-SegmentReaders, should be extracted to a utility or something. But if I'm the only one that's bothered by it, then so be it. I can take care of the rest now, and resolve that one later.
        Hide
        Andrzej Bialecki added a comment -

        I understand - see the edited section in my comment: I think that extracting this non-SR code would be great. I would be in fact glad if there was an easier to control API that allows us to directly stream-process postings / stored / tvf-s / etc. in a way that results in a functioning index. Take for example LUCENE-1812 - the only reason it uses addIndexes(IndexReader) is that there was no easy way to modify postings in a way that would still result in a valid index, and there was no other API to add artificially created postings (i.e. not coming from a Directory) to a target index.

        Show
        Andrzej Bialecki added a comment - I understand - see the edited section in my comment: I think that extracting this non-SR code would be great. I would be in fact glad if there was an easier to control API that allows us to directly stream-process postings / stored / tvf-s / etc. in a way that results in a functioning index. Take for example LUCENE-1812 - the only reason it uses addIndexes(IndexReader) is that there was no easy way to modify postings in a way that would still result in a valid index, and there was no other API to add artificially created postings (i.e. not coming from a Directory) to a target index.
        Hide
        Shai Erera added a comment -

        So can't the PrunningReader run on the side, converting the postings to whatever they're supposed to look like in the index they are about to be added to, and then call addIndexes w/ the Directory to do the bulk copy? I mean, instead of looking for a standalone tool, perhaps this can be solved on a case by case basis? Of course, if this can be made generic enough, then we can add it as a core utility, or IW method.

        Show
        Shai Erera added a comment - So can't the PrunningReader run on the side, converting the postings to whatever they're supposed to look like in the index they are about to be added to, and then call addIndexes w/ the Directory to do the bulk copy? I mean, instead of looking for a standalone tool, perhaps this can be solved on a case by case basis? Of course, if this can be made generic enough, then we can add it as a core utility, or IW method.
        Hide
        Andrzej Bialecki added a comment -

        So can't the PrunningReader run on the side, converting the postings to whatever they're supposed to look like

        Erhm ... Currently the only way in the user API to write out existing postings (no matter how created) is to use IndexWriter.addIndexes(IndexReader). We can read postings just fine, using various *Enum classes that we can obtain from IndexReader, but there are no comparable high-level output methods - Codecs and other flex classes are IMHO too low-level.

        Also, with large indexes the amount of IO/CPU for writing out a Directory and reopening it is non-trivial - it's much more efficient to do this via streaming from the original, already open index.

        Also, if we remove this method, then FilterIndexReader may as well go too, because it loses its utility.

        Show
        Andrzej Bialecki added a comment - So can't the PrunningReader run on the side, converting the postings to whatever they're supposed to look like Erhm ... Currently the only way in the user API to write out existing postings (no matter how created) is to use IndexWriter.addIndexes(IndexReader). We can read postings just fine, using various *Enum classes that we can obtain from IndexReader, but there are no comparable high-level output methods - Codecs and other flex classes are IMHO too low-level. Also, with large indexes the amount of IO/CPU for writing out a Directory and reopening it is non-trivial - it's much more efficient to do this via streaming from the original, already open index. Also, if we remove this method, then FilterIndexReader may as well go too, because it loses its utility.
        Hide
        Shai Erera added a comment -

        Ok let's keep addIndexes(IndexReader) around. This means though that we cannot simplify the PPP API. We'll still need DirPP.

        Show
        Shai Erera added a comment - Ok let's keep addIndexes(IndexReader) around. This means though that we cannot simplify the PPP API. We'll still need DirPP.
        Hide
        Shai Erera added a comment -

        I've started to implement addIndexes(Directory...) as agreed - copy files from the incoming ones into the local directory, while renaming them on the fly. This works really well with non-CFS segments: a new segment name is generated, the incoming files are renamed and this all flies smoothly (didn't test w/ deletions yet) - even shared doc stores work great.

        But with CFS it doesn't work well because CFS writes the file names in the CFS file itself, and so even if the segment is renamed to _5 (for example), the names that are written in the file are _2.* (for example), and openInput fails to locate them. To overcome this, I propose we do the following:

        • Introduce on IndexFileNames a stripName method (3x and trunk) - will return the file name w/o the _x part.
        • CFR ctor - strip names of read file names by calling IFN.stripName --> 3x only
        • CFR.openInput - strip name by calling IFN.stripName --> 3x and trunk
        • Document that files should be created through IFN only --> 3x (for clarity) and trunk (otherwise may not be supported).
        • Not save the name in CFS --> trunk only. Will remove the need to strip it off when it's read.

        That will ensure that files are named following a certain convention which we can rely on in CFR. I don't think it's too hard to ask for. CFS itself already knows the name - it's named like it. So there's no value in storing the names of the files it holds.

        For 3x it should work well b/c we don't allow for custom index files. For trunk we'll ask to go through IFN to name files - so one can create mycustom.file through IFN which will be called _x_mycustom.file.

        What do you think?

        Show
        Shai Erera added a comment - I've started to implement addIndexes(Directory...) as agreed - copy files from the incoming ones into the local directory, while renaming them on the fly. This works really well with non-CFS segments: a new segment name is generated, the incoming files are renamed and this all flies smoothly (didn't test w/ deletions yet) - even shared doc stores work great. But with CFS it doesn't work well because CFS writes the file names in the CFS file itself, and so even if the segment is renamed to _5 (for example), the names that are written in the file are _2.* (for example), and openInput fails to locate them. To overcome this, I propose we do the following: Introduce on IndexFileNames a stripName method (3x and trunk) - will return the file name w/o the _x part. CFR ctor - strip names of read file names by calling IFN.stripName --> 3x only CFR.openInput - strip name by calling IFN.stripName --> 3x and trunk Document that files should be created through IFN only --> 3x (for clarity) and trunk (otherwise may not be supported). Not save the name in CFS --> trunk only. Will remove the need to strip it off when it's read. That will ensure that files are named following a certain convention which we can rely on in CFR. I don't think it's too hard to ask for. CFS itself already knows the name - it's named like it. So there's no value in storing the names of the files it holds. For 3x it should work well b/c we don't allow for custom index files. For trunk we'll ask to go through IFN to name files - so one can create mycustom.file through IFN which will be called _x_mycustom.file. What do you think?
        Hide
        Michael McCandless added a comment -

        Ahh sneaky that CFS still embeds the old segment's name (you're right). The only other option would be to rewrite the CFS header, but then that's not easy to do a bulk copy on. So I like you're approach!

        We should document in Codec.java that this (you must gen your filename via IFN's APIs) is a requirement of any custom files your codec wants to store in the index.

        Show
        Michael McCandless added a comment - Ahh sneaky that CFS still embeds the old segment's name (you're right). The only other option would be to rewrite the CFS header, but then that's not easy to do a bulk copy on. So I like you're approach! We should document in Codec.java that this (you must gen your filename via IFN's APIs) is a requirement of any custom files your codec wants to store in the index.
        Hide
        Shai Erera added a comment -

        Patch includes the following:

        • addIndexesNoOpt renamed to addIndexes2 for now, until we resolve the failing test (see below). I'll remove it and fix jdocs accordingly afterwards.
        • addIndexes(Dir...) implements the simple file copy strategy.
        • Tests updated accordingly.
        • Some minor changes to CompoundFileReader and IndexFileNames, as discussed before.

        All tests pass except for TestIndexWriter.testAddIndexesWithThreads. I've debugged it, but cannot find the reason. addIndexes copies all segments, before it adds them to the writer's segmentInfos. Maybe I need to use start/commit transaction on that part only, to lock all ops? I don't see why, but maybe?

        Also, TestAddIndexes.testWithPendingDeletes2() (and some others) fail before I added a call to flush to addIndexes. It seems that w/o it, existing buffered deleted docs are ignored after addIndexes returns (even when no multi-threading is involved). Can someone please confirm that?

        Also, I cannot simplify PPP (to remove DirPP) because we kept addIndexes(Reader...). It's an annoyance if you don't call this method (need to return a DirPP for the target Dir always - if you want to use it), but maybe not so bad ...

        Show
        Shai Erera added a comment - Patch includes the following: addIndexesNoOpt renamed to addIndexes2 for now, until we resolve the failing test (see below). I'll remove it and fix jdocs accordingly afterwards. addIndexes(Dir...) implements the simple file copy strategy. Tests updated accordingly. Some minor changes to CompoundFileReader and IndexFileNames, as discussed before. All tests pass except for TestIndexWriter.testAddIndexesWithThreads. I've debugged it, but cannot find the reason. addIndexes copies all segments, before it adds them to the writer's segmentInfos. Maybe I need to use start/commit transaction on that part only, to lock all ops? I don't see why, but maybe? Also, TestAddIndexes.testWithPendingDeletes2() (and some others) fail before I added a call to flush to addIndexes. It seems that w/o it, existing buffered deleted docs are ignored after addIndexes returns (even when no multi-threading is involved). Can someone please confirm that? Also, I cannot simplify PPP (to remove DirPP) because we kept addIndexes(Reader...). It's an annoyance if you don't call this method (need to return a DirPP for the target Dir always - if you want to use it), but maybe not so bad ...
        Hide
        Shai Erera added a comment -

        Attached patch includes:

        • Fixes a bug that caused some tests to fail.
        • CFS is now versioned:
          • CFW writes a version header, and CFR reads it
          • CFW strips the segment name from the filename before writing it
          • CFR back-supports pre-3.1 indexes depending on the existence/absence of the version header.
        • TestBackwardsCompatibility now covers 3.0 indexes as well, and addIndexes* ops.

        The beauty of all this is that IndexWriter no longer needs those transactions, and is now 500 lines of code + jdoc down !

        After we've iterated through this patch, I'll do the same changes on trunk. Backwards support should be much easier there, because we will provide an index migration tool anyway, and so CFW/CFR can always assume they're reading the latest version (at least in 4.0). CFW should probably use CodecUtils in trunk - it cannot be used in 3x because of how CFW works today - writing a VInt first, while CodecUtils assumes an Int. And I don't think it's healthy to do so much changes on 3x.

        Show
        Shai Erera added a comment - Attached patch includes: Fixes a bug that caused some tests to fail. CFS is now versioned: CFW writes a version header, and CFR reads it CFW strips the segment name from the filename before writing it CFR back-supports pre-3.1 indexes depending on the existence/absence of the version header. TestBackwardsCompatibility now covers 3.0 indexes as well, and addIndexes* ops. The beauty of all this is that IndexWriter no longer needs those transactions, and is now 500 lines of code + jdoc down ! After we've iterated through this patch, I'll do the same changes on trunk. Backwards support should be much easier there, because we will provide an index migration tool anyway, and so CFW/CFR can always assume they're reading the latest version (at least in 4.0). CFW should probably use CodecUtils in trunk - it cannot be used in 3x because of how CFW works today - writing a VInt first, while CodecUtils assumes an Int. And I don't think it's healthy to do so much changes on 3x.
        Hide
        Michael McCandless added a comment -

        Patch looks great! So awesome seeing all the -'s in IW.java!! Keep it up

        And it's great that you added 3.0 back compat case to
        TestBackwardsCompatibility...

        Some feedback:

        • Can you change the code to read to a "int firstInt" instead of
          version? And make an explicit version (say "PRE_VERSION"), and
          then check if version is PRE_VERSION in the code. Ie, any tests
          against version (eg version > 0) should be against constants
          (version == PRE_VEFRSION) not against 0.
        • CFW's comment should be "make it 1 lower" than the current one
          right? Ie, -2 is the next version?
        Show
        Michael McCandless added a comment - Patch looks great! So awesome seeing all the -'s in IW.java!! Keep it up And it's great that you added 3.0 back compat case to TestBackwardsCompatibility... Some feedback: Can you change the code to read to a "int firstInt" instead of version? And make an explicit version (say "PRE_VERSION"), and then check if version is PRE_VERSION in the code. Ie, any tests against version (eg version > 0) should be against constants (version == PRE_VEFRSION) not against 0. CFW's comment should be "make it 1 lower" than the current one right? Ie, -2 is the next version?
        Hide
        Michael McCandless added a comment -

        Backwards support should be much easier there, because we will provide an index migration tool anyway, and so CFW/CFR can always assume they're reading the latest version (at least in 4.0).

        Hmm I think we should do live migration for this (ie don't require a
        migration tool to fix your index)? This is trivial to do on the fly
        right (ie as you've done in 3.x).

        CFW should probably use CodecUtils in trunk - it cannot be used in 3x because of how CFW works today - writing a VInt first, while CodecUtils assumes an Int. And I don't think it's healthy to do so much changes on 3x.

        Hmm yeah because of the live migration I think CodecUtils is not
        actually a fit here (trunk or 3x).

        Show
        Michael McCandless added a comment - Backwards support should be much easier there, because we will provide an index migration tool anyway, and so CFW/CFR can always assume they're reading the latest version (at least in 4.0). Hmm I think we should do live migration for this (ie don't require a migration tool to fix your index)? This is trivial to do on the fly right (ie as you've done in 3.x). CFW should probably use CodecUtils in trunk - it cannot be used in 3x because of how CFW works today - writing a VInt first, while CodecUtils assumes an Int. And I don't think it's healthy to do so much changes on 3x. Hmm yeah because of the live migration I think CodecUtils is not actually a fit here (trunk or 3x).
        Hide
        Shai Erera added a comment -

        I'm not sure about the live migration, Mike. First because all the problems I've mentioned about CodecUtils in 3x will apply to live migration of 3.x indexes in 4.0 code. Second, if everyone who upgrades to 4.0 will need to run the migration tool, then why do any work in supporting online migration? What's the benefit? Do u think of a case where someone upgrades to 4.0 w/o migrating his indexes (unless he reindexes of course, in which case there is no problem)?

        I just think it's weird that we support online migration together w/ a migration tool. If we migrate the indexes w/ the tool to include the new format of CFS, then the online migration code won't ever run, right? And not doing this in the tool seems just a waste? I mean the user already migrates his indexes, so why incur the cost of an additional online migration?

        Show
        Shai Erera added a comment - I'm not sure about the live migration, Mike. First because all the problems I've mentioned about CodecUtils in 3x will apply to live migration of 3.x indexes in 4.0 code. Second, if everyone who upgrades to 4.0 will need to run the migration tool, then why do any work in supporting online migration? What's the benefit? Do u think of a case where someone upgrades to 4.0 w/o migrating his indexes (unless he reindexes of course, in which case there is no problem)? I just think it's weird that we support online migration together w/ a migration tool. If we migrate the indexes w/ the tool to include the new format of CFS, then the online migration code won't ever run, right? And not doing this in the tool seems just a waste? I mean the user already migrates his indexes, so why incur the cost of an additional online migration?
        Hide
        Michael McCandless added a comment -

        Sorry – for each major release, I think it'll be either live
        migration or offline migration, but not both.

        So far for 4.0 we haven't had a major enough structural change to the
        index format, that'd make live migration too hard/risky, so, so far I
        think we can offer live migration for 4.0.

        The biggest change was flex, but it has the preflex codec to read (not
        write) the pre-4.0 format... so, so far I think we can still offer
        live migration for 4.0?

        Show
        Michael McCandless added a comment - Sorry – for each major release, I think it'll be either live migration or offline migration, but not both. So far for 4.0 we haven't had a major enough structural change to the index format, that'd make live migration too hard/risky, so, so far I think we can offer live migration for 4.0. The biggest change was flex, but it has the preflex codec to read (not write) the pre-4.0 format... so, so far I think we can still offer live migration for 4.0?
        Hide
        Shai Erera added a comment -

        Ahh, I knew we must be talking past each other . I assumed that the flex changes will go under the migration tool. If we have live migration for it, then I agree we should do live migration here.

        With that behind us, did someone start an API migration guide? Since I remove addIndexesnoOptimize in favor of the new addIndexes, I wanted to document it somewhere. It's a tiny change, so perhaps it can go other the API Changes in CHANGES?

        Show
        Shai Erera added a comment - Ahh, I knew we must be talking past each other . I assumed that the flex changes will go under the migration tool. If we have live migration for it, then I agree we should do live migration here. With that behind us, did someone start an API migration guide? Since I remove addIndexesnoOptimize in favor of the new addIndexes, I wanted to document it somewhere. It's a tiny change, so perhaps it can go other the API Changes in CHANGES?
        Hide
        Michael McCandless added a comment -

        With that behind us, did someone start an API migration guide?

        Not yet, I think? Go for it!

        Show
        Michael McCandless added a comment - With that behind us, did someone start an API migration guide? Not yet, I think? Go for it!
        Hide
        Shai Erera added a comment -

        I will document it in CHANGES under API section. I think the migration guide format will need its own discussion, and I don't want to block that issue. When we've agreed on the format (people have made few suggestions), I don't mind helping w/ porting everything relevant from changes to that guide.

        Show
        Shai Erera added a comment - I will document it in CHANGES under API section. I think the migration guide format will need its own discussion, and I don't want to block that issue. When we've agreed on the format (people have made few suggestions), I don't mind helping w/ porting everything relevant from changes to that guide.
        Hide
        Shai Erera added a comment -

        CFW's comment should be "make it 1 lower"

        Right ! I copied it from FieldsWriter where the versions are kept as positive ints. Will post a patch shortly.

        Show
        Shai Erera added a comment - CFW's comment should be "make it 1 lower" Right ! I copied it from FieldsWriter where the versions are kept as positive ints. Will post a patch shortly.
        Hide
        Shai Erera added a comment -

        Patch applies Mike's comments. I think this is ready to go in. I'd like to commit to 3x before trunk, because there are lots of changes here.

        Show
        Shai Erera added a comment - Patch applies Mike's comments. I think this is ready to go in. I'd like to commit to 3x before trunk, because there are lots of changes here.
        Hide
        Michael McCandless added a comment -

        Could you fix "firstInt' to have a very short life?

        Meaning, you read firstInt, and very quickly use that to assign to version & count, and no longer use it again. Ie, all subsequent checks when loading should be against version, not firstInt...

        Also, can you maybe rename CFW.PRE_VERSION -> CFW.FORMAT_PRE_VERSION? (to match the other FORMAT_X).

        Otherwise looks great!

        Show
        Michael McCandless added a comment - Could you fix "firstInt' to have a very short life? Meaning, you read firstInt, and very quickly use that to assign to version & count, and no longer use it again. Ie, all subsequent checks when loading should be against version, not firstInt... Also, can you maybe rename CFW.PRE_VERSION -> CFW.FORMAT_PRE_VERSION? (to match the other FORMAT_X). Otherwise looks great!
        Hide
        Shai Erera added a comment -

        The only place I see firstInt is used perhaps unnecessarily is in the for-loop. So I've changed the code to look like this:

        int count, version;
        if (firstInt < CompoundFileWriter.FORMAT_PRE_VERSION) {
          count = stream.readVInt();
          version = firstInt;
        } else {
          count = firstInt;
          version = CompoundFileWriter.FORMAT_PRE_VERSION;
        }
        

        And then I query for version == CompoundFileWriter.FORMAT_PRE_VERSION inside the for-loop. Is that what you meant?

        There is a check before all that ensuring that read firstInt does not indicate an index corruption – that should remain as-is, right?

        Show
        Shai Erera added a comment - The only place I see firstInt is used perhaps unnecessarily is in the for-loop. So I've changed the code to look like this: int count, version; if (firstInt < CompoundFileWriter.FORMAT_PRE_VERSION) { count = stream.readVInt(); version = firstInt; } else { count = firstInt; version = CompoundFileWriter.FORMAT_PRE_VERSION; } And then I query for version == CompoundFileWriter.FORMAT_PRE_VERSION inside the for-loop. Is that what you meant? There is a check before all that ensuring that read firstInt does not indicate an index corruption – that should remain as-is, right?
        Hide
        Shai Erera added a comment -

        Update w/ comments. I plan to commit this either later today or tomorrow (and then port it to trunk). So if you haven't done so and want a last chance review - that's your chance.

        Show
        Shai Erera added a comment - Update w/ comments. I plan to commit this either later today or tomorrow (and then port it to trunk). So if you haven't done so and want a last chance review - that's your chance.
        Hide
        Michael McCandless added a comment -

        Patch looks good Shai! Thanks.

        Show
        Michael McCandless added a comment - Patch looks good Shai! Thanks.
        Hide
        Shai Erera added a comment -

        Committed revision 948394 (3x).

        Will now port everything to trunk

        Show
        Shai Erera added a comment - Committed revision 948394 (3x). Will now port everything to trunk
        Hide
        Uwe Schindler added a comment -

        Hi Shai,

        I have seen this only lately. You added a 3.0 Index ZIP to the tests. This conflicts a little bit with trunk, where a 3.0 Index ZIP is already available. I would prefer to keep the "older version" ZIPs equal against each release, so it would be fine, if the trunk-added numerics backwards test could also be in 3.x branch. Would this be possible? You have to just merge the code.

        Also it looks strange that the 3.0 backwards tests now contain also 3.0 index ZIPs, but there is no code for that??? Why have you added this to backwards? The 3.0 backwards tests should only modify this one addindexes test, but not add the zips. Maybe simple delete, they are not used.

        By the way the 3.0 index zip file generation code is in the 3.0 branch, have you edited it there? You should commit the code there so one is able to regenerate the 3.0 ZIPs from the stable 3.0.x branch.

        Show
        Uwe Schindler added a comment - Hi Shai, I have seen this only lately. You added a 3.0 Index ZIP to the tests. This conflicts a little bit with trunk, where a 3.0 Index ZIP is already available. I would prefer to keep the "older version" ZIPs equal against each release, so it would be fine, if the trunk-added numerics backwards test could also be in 3.x branch. Would this be possible? You have to just merge the code. Also it looks strange that the 3.0 backwards tests now contain also 3.0 index ZIPs, but there is no code for that??? Why have you added this to backwards? The 3.0 backwards tests should only modify this one addindexes test, but not add the zips. Maybe simple delete, they are not used. By the way the 3.0 index zip file generation code is in the 3.0 branch, have you edited it there? You should commit the code there so one is able to regenerate the 3.0 ZIPs from the stable 3.0.x branch.
        Hide
        Uwe Schindler added a comment -

        I looked at the code, it simply tests trhat old indexes can be added. Maybe you just copy the trunk ZIPs for 3.0 to the 3x branch to keep them consistent. The files dont seem to be equal.

        Show
        Uwe Schindler added a comment - I looked at the code, it simply tests trhat old indexes can be added. Maybe you just copy the trunk ZIPs for 3.0 to the 3x branch to keep them consistent. The files dont seem to be equal.
        Hide
        Shai Erera added a comment -

        Ok I added the indexes from trunk (didn't know they were there). I've changed CFS to write a version header in the file, so that's why I've added a 3.0 index - to make sure it can be read properly by 3.1. What I've added to TestBackwardsCompatibility are tests to ensure that addIndexes work on old indexes (which was good, because after the changes they weren't !).

        Maybe simple delete, they are not used.

        The testAddIndexes were just added, and the 30 indexes are used. So I cannot delete them (see my comment above)

        By the way the 3.0 index zip file generation code is in the 3.0 branch, have you edited it there?

        Nope, it exists in TestBackwardsCompatibility as commented out, w/ instructions to uncomment. I've used that code.

        Show
        Shai Erera added a comment - Ok I added the indexes from trunk (didn't know they were there). I've changed CFS to write a version header in the file, so that's why I've added a 3.0 index - to make sure it can be read properly by 3.1. What I've added to TestBackwardsCompatibility are tests to ensure that addIndexes work on old indexes (which was good, because after the changes they weren't !). Maybe simple delete, they are not used. The testAddIndexes were just added, and the 30 indexes are used. So I cannot delete them (see my comment above) By the way the 3.0 index zip file generation code is in the 3.0 branch, have you edited it there? Nope, it exists in TestBackwardsCompatibility as commented out, w/ instructions to uncomment. I've used that code.
        Hide
        Shai Erera added a comment -

        While porting the code to trunk, I've noticed that acquireRead/Write, releaseRead/Write, upgradeReadToWrite are either not called anymore, or called in relation to addIndexes. So I think these can be safely removed as well (from 3x and trunk)?

        Show
        Shai Erera added a comment - While porting the code to trunk, I've noticed that acquireRead/Write, releaseRead/Write, upgradeReadToWrite are either not called anymore, or called in relation to addIndexes. So I think these can be safely removed as well (from 3x and trunk)?
        Hide
        Michael McCandless added a comment -

        So I think these can be safely removed as well (from 3x and trunk)?

        I think so!

        Show
        Michael McCandless added a comment - So I think these can be safely removed as well (from 3x and trunk)? I think so!
        Hide
        Shai Erera added a comment -

        Committed revision 948415 (copied the 3.0 indexes from trunk) and removed more unnecessary code from IndexWriter.

        Show
        Shai Erera added a comment - Committed revision 948415 (copied the 3.0 indexes from trunk) and removed more unnecessary code from IndexWriter.
        Hide
        Shai Erera added a comment -

        Like the 3x patch, only this one changes IndexFileNames.segmentFileName to take another parameter for custom names, as well as update some jdocs to match flex (Codecs). I think this is ready to go in.

        Show
        Shai Erera added a comment - Like the 3x patch, only this one changes IndexFileNames.segmentFileName to take another parameter for custom names, as well as update some jdocs to match flex (Codecs). I think this is ready to go in.
        Hide
        Uwe Schindler added a comment -

        Should we not add a 3.1 index (created with HEAD 3.x branch) to the TestBackwardsCompatibility? So we can verify that preflex indexes with new CFS header also work?

        Show
        Uwe Schindler added a comment - Should we not add a 3.1 index (created with HEAD 3.x branch) to the TestBackwardsCompatibility? So we can verify that preflex indexes with new CFS header also work?
        Hide
        Shai Erera added a comment -

        Yes! I'll add them and update the tests. Will post a patch after I get more comments

        Show
        Shai Erera added a comment - Yes! I'll add them and update the tests. Will post a patch after I get more comments
        Hide
        Shai Erera added a comment -

        Hmm ... I've created the indexes using the 3x branch, copied them to trunk and updated TestBackwardsCompatibility to refer to them. All tests pass except for testNumericFields. It fails on both CFS and non-CFS indexes, and so I'm not sure it's related to this issue at all. The failure is this:

        junit.framework.AssertionFailedError: wrong number of hits expected:<1> but was:<0>
        	at org.apache.lucene.index.TestBackwardsCompatibility.testNumericFields(TestBackwardsCompatibility.java:773)
        

        Can you try to run it on your checkout?

        Show
        Shai Erera added a comment - Hmm ... I've created the indexes using the 3x branch, copied them to trunk and updated TestBackwardsCompatibility to refer to them. All tests pass except for testNumericFields. It fails on both CFS and non-CFS indexes, and so I'm not sure it's related to this issue at all. The failure is this: junit.framework.AssertionFailedError: wrong number of hits expected:<1> but was:<0> at org.apache.lucene.index.TestBackwardsCompatibility.testNumericFields(TestBackwardsCompatibility.java:773) Can you try to run it on your checkout?
        Hide
        Uwe Schindler added a comment -

        For me it passes.

        Are you sure that you used the latest checkout of 3x. I added the index generation code yesterday after your last 3x commit. This code was not merged to 3x from trunk, as it was postflex added. This is done sice yesterday:

        Author: uschindler
        Date: Wed May 26 13:13:10 2010
        New Revision: 948420
        
        URL: http://svn.apache.org/viewvc?rev=948420&view=rev
        Log:
        Merge the 3.0 index backwards tests from trunk (numeric field support). This makes it consistent across all branches.
        
        Modified:
            lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/index/   (props changed)
            lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java   (contents, props changed)
        

        I attached the generated ZIP files from my 3x checkout.

        Show
        Uwe Schindler added a comment - For me it passes. Are you sure that you used the latest checkout of 3x. I added the index generation code yesterday after your last 3x commit. This code was not merged to 3x from trunk, as it was postflex added. This is done sice yesterday: Author: uschindler Date: Wed May 26 13:13:10 2010 New Revision: 948420 URL: http://svn.apache.org/viewvc?rev=948420&view=rev Log: Merge the 3.0 index backwards tests from trunk (numeric field support). This makes it consistent across all branches. Modified: lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/index/ (props changed) lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java (contents, props changed) I attached the generated ZIP files from my 3x checkout.
        Hide
        Shai Erera added a comment -

        Yes - after I updated my checkout and re-create the indexes, the test passes. So I will include them with this patch as well.

        Show
        Shai Erera added a comment - Yes - after I updated my checkout and re-create the indexes, the test passes. So I will include them with this patch as well.
        Hide
        Shai Erera added a comment -

        Committed revision 948861 (trunk).

        Show
        Shai Erera added a comment - Committed revision 948861 (trunk).
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development