Lucene - Core
  1. Lucene - Core
  2. LUCENE-1313

Near Realtime Search (using a built in RAMDirectory)

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: 2.4.1
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Enable near realtime search in Lucene without external
      dependencies. When RAM NRT is enabled, the implementation adds a
      RAMDirectory to IndexWriter. Flushes go to the ramdir unless
      there is no available space. Merges are completed in the ram
      dir until there is no more available ram.

      IW.optimize and IW.commit flush the ramdir to the primary
      directory, all other operations try to keep segments in ram
      until there is no more space.

      1. LUCENE-1313.jar
        5 kB
        Jason Rutherglen
      2. lucene-1313.patch
        474 kB
        Jason Rutherglen
      3. lucene-1313.patch
        698 kB
        Jason Rutherglen
      4. lucene-1313.patch
        680 kB
        Jason Rutherglen
      5. lucene-1313.patch
        1.88 MB
        Jason Rutherglen
      6. LUCENE-1313.patch
        46 kB
        Jason Rutherglen
      7. LUCENE-1313.patch
        35 kB
        Jason Rutherglen
      8. LUCENE-1313.patch
        60 kB
        Jason Rutherglen
      9. LUCENE-1313.patch
        39 kB
        Jason Rutherglen
      10. LUCENE-1313.patch
        37 kB
        Jason Rutherglen
      11. LUCENE-1313.patch
        36 kB
        Jason Rutherglen
      12. LUCENE-1313.patch
        22 kB
        Jason Rutherglen
      13. LUCENE-1313.patch
        22 kB
        Jason Rutherglen
      14. LUCENE-1313.patch
        14 kB
        Jason Rutherglen
      15. LUCENE-1313.patch
        36 kB
        Jason Rutherglen
      16. LUCENE-1313.patch
        33 kB
        Jason Rutherglen
      17. LUCENE-1313.patch
        175 kB
        Jason Rutherglen
      18. LUCENE-1313.patch
        166 kB
        Jason Rutherglen
      19. LUCENE-1313.patch
        175 kB
        Jason Rutherglen
      20. LUCENE-1313.patch
        175 kB
        Jason Rutherglen
      21. LUCENE-1313.patch
        173 kB
        Jason Rutherglen
      22. LUCENE-1313.patch
        170 kB
        Jason Rutherglen
      23. LUCENE-1313.patch
        131 kB
        Jason Rutherglen
      24. LUCENE-1313.patch
        129 kB
        Jason Rutherglen
      25. LUCENE-1313.patch
        109 kB
        Jason Rutherglen
      26. LUCENE-1313.patch
        69 kB
        Jason Rutherglen
      27. LUCENE-1313.patch
        47 kB
        Jason Rutherglen
      28. LUCENE-1313.patch
        52 kB
        Jason Rutherglen
      29. LUCENE-1313.patch
        49 kB
        Jason Rutherglen
      30. LUCENE-1313.patch
        37 kB
        Jason Rutherglen
      31. LUCENE-1313.patch
        21 kB
        Jason Rutherglen
      32. LUCENE-1313.patch
        14 kB
        Jason Rutherglen
      33. LUCENE-1313.patch
        12 kB
        Jason Rutherglen
      34. LUCENE-1313.patch
        473 kB
        Jason Rutherglen
      35. TestLuceneNRT.java
        6 kB
        Jingkei Ly

        Issue Links

          Activity

          Hide
          Jason Rutherglen added a comment -

          lucene-1313.patch

          Patch includes libraries:

          • commons-io-1.3.2.jar
          • commons-lang-2.3.jar
          • jdom.jar
          • slf4j-api-1.5.2.jar
          • slf4j-simple-1.5.2.jar
          • source from http://reader.imagero.com/uio/
          • source from net.sourceforge.jsorter
          Show
          Jason Rutherglen added a comment - lucene-1313.patch Patch includes libraries: commons-io-1.3.2.jar commons-lang-2.3.jar jdom.jar slf4j-api-1.5.2.jar slf4j-simple-1.5.2.jar source from http://reader.imagero.com/uio/ source from net.sourceforge.jsorter
          Hide
          Jason Rutherglen added a comment -

          lucene-1313.patch

          Removed http://reader.imagero.com/uio/ code as it routinely corrupted the log. It's replacement is RandomAccessFile. Added MultiThreadSearcherPolicy that is used to created a multi threaded Searcher. Transaction multithreading has been removed because it makes it hard to debug. It will be optional in the future. Many bugs have been fixed. TestSearch tests for deletes. Index directories are now of the form "2_index", the index id and the suffix "_index".

          Show
          Jason Rutherglen added a comment - lucene-1313.patch Removed http://reader.imagero.com/uio/ code as it routinely corrupted the log. It's replacement is RandomAccessFile. Added MultiThreadSearcherPolicy that is used to created a multi threaded Searcher. Transaction multithreading has been removed because it makes it hard to debug. It will be optional in the future. Many bugs have been fixed. TestSearch tests for deletes. Index directories are now of the form "2_index", the index id and the suffix "_index".
          Hide
          Jason Rutherglen added a comment -

          lucene-1313.patch

          Depends on LUCENE-1312 and LUCENE-1314. More bugs fixed. Deletes are committed to indexes only intermittently which improves the update speed dramatically. MaybeMergeIndexes now runs via a background timer.

          Will remove writing a snapshot.xml file per transaction in favor of a human readable log. Creating and deleting these small files is a bottleneck for update speed. This way a transaction writes to 2 files only. The merges happen in the background and so never affect the transaction update speed. I am not sure how useful it would be, but it is possible to have a priority based IO system that favors transactions over merges. If a transaction is coming in and a merge is happening to disk, the merge is stopped and the transaction IO runs, then the merge IO continues.

          I am not sure how to handle Documents with Fields that have a TokenStream as the value as I believe these cannot be serialized. For now I assume it will be unsupported.

          Also not sure how to handle analyzers, are these generally serializable? It would be useful to serialize them for a more automated log recovery process.

          Show
          Jason Rutherglen added a comment - lucene-1313.patch Depends on LUCENE-1312 and LUCENE-1314 . More bugs fixed. Deletes are committed to indexes only intermittently which improves the update speed dramatically. MaybeMergeIndexes now runs via a background timer. Will remove writing a snapshot.xml file per transaction in favor of a human readable log. Creating and deleting these small files is a bottleneck for update speed. This way a transaction writes to 2 files only. The merges happen in the background and so never affect the transaction update speed. I am not sure how useful it would be, but it is possible to have a priority based IO system that favors transactions over merges. If a transaction is coming in and a merge is happening to disk, the merge is stopped and the transaction IO runs, then the merge IO continues. I am not sure how to handle Documents with Fields that have a TokenStream as the value as I believe these cannot be serialized. For now I assume it will be unsupported. Also not sure how to handle analyzers, are these generally serializable? It would be useful to serialize them for a more automated log recovery process.
          Hide
          Jason Rutherglen added a comment -

          lucene-1313.patch

          • Depends on LUCENE-1314
          • OceanSegmentReader implements reuse of deletedDocs bytes in conjunction with LUCENE-1314
          • Snapshot logging happens to a rolling log file
          • CRC32 checking added to transaction log
          • Added TestSystem test case that performs adds, updates and deletes. TestSystem uses arbitrarily small settings numbers to force the various background merges to happen within a minimal number of transactions
          • Transactions with over N documents encoded into a segment (via RAMDirectory) to the transaction log rather than serialized as a Document
          • Started wiki page http://wiki.apache.org/lucene-java/OceanRealtimeSearch linked from http://wiki.apache.org/lucene-java/LuceneResources. Will place documentation there.
          • Document fields with Reader or TokenStream values supported

          Began work on LargeBatch functionality, needs test case. Large batches allow adding documents in bulk (also performing deletes) in a transaction that goes straight to an index bypassing the transaction log. This provides the same speed as using IndexWriter to perform bulk Document processing in Ocean.

          Started OceanDatabase which will offer a Java API inspired by GData. Will offer optimistic concurrency (something required in a realtime search system) and dynamic object mapping (meaning types such as long, date, double will be mapped to a string term using some Solr code). A file sync is performed after each transaction, will add an option to allow syncing after N transactions like mysql. This will improve realtime update speeds.

          Future:

          • Support for multiple servers by implementing distributed API and replication using LUCENE-1336
          • Test case that is akin to TestStressIndexing2 mainly to test threading
          • Add LargeBatch test to TestSystem
          • Facets
          • Looking at adding GData compatible XML over HTTP API. Possibly can reuse the old Lucene GData code.
          • Integrate tag index when it's completed
          • Add LRU record cache to transaction log which will be useful for faster replication
          Show
          Jason Rutherglen added a comment - lucene-1313.patch Depends on LUCENE-1314 OceanSegmentReader implements reuse of deletedDocs bytes in conjunction with LUCENE-1314 Snapshot logging happens to a rolling log file CRC32 checking added to transaction log Added TestSystem test case that performs adds, updates and deletes. TestSystem uses arbitrarily small settings numbers to force the various background merges to happen within a minimal number of transactions Transactions with over N documents encoded into a segment (via RAMDirectory) to the transaction log rather than serialized as a Document Started wiki page http://wiki.apache.org/lucene-java/OceanRealtimeSearch linked from http://wiki.apache.org/lucene-java/LuceneResources . Will place documentation there. Document fields with Reader or TokenStream values supported Began work on LargeBatch functionality, needs test case. Large batches allow adding documents in bulk (also performing deletes) in a transaction that goes straight to an index bypassing the transaction log. This provides the same speed as using IndexWriter to perform bulk Document processing in Ocean. Started OceanDatabase which will offer a Java API inspired by GData. Will offer optimistic concurrency (something required in a realtime search system) and dynamic object mapping (meaning types such as long, date, double will be mapped to a string term using some Solr code). A file sync is performed after each transaction, will add an option to allow syncing after N transactions like mysql. This will improve realtime update speeds. Future: Support for multiple servers by implementing distributed API and replication using LUCENE-1336 Test case that is akin to TestStressIndexing2 mainly to test threading Add LargeBatch test to TestSystem Facets Looking at adding GData compatible XML over HTTP API. Possibly can reuse the old Lucene GData code. Integrate tag index when it's completed Add LRU record cache to transaction log which will be useful for faster replication
          Hide
          Karl Wettin added a comment -

          Hi Jason,

          I took an inital look at your code last night. Didn't actually execute anything, just followed method calls around to see what it was up to.

          My first comment is sort of boring, but there are virtually no javadocs for the core classes such as TransactionSystem, Batch and Index. It would be great if there was a bit at the class level exaplaining what classes they interact with and how. It would also be very helpful if there was method level javadocs for at least the top level commit related logic.

          One thing that early cought my attention is this method in TransactionSystem:

            public OceanSearcher getSearcher() throws IOException {
              Snapshot snapshot = snapshots.getLatestSnapshot();
              if (searcherPolicy instanceof SingleThreadSearcherPolicy) {
                return new OceanSearcher(snapshot);
              } else {
                return new OceanMultiThreadSearcher(snapshot, searchThreadPool);
              }
            }
          

          Am I supposed to call this method for each query (as suggested by the method name) or is this a factory method used to update my own Searcher instance after committing documents to the index (as suggested by the code)?

          It's not such a big deal, but I personally think you should refactor the instanceOf to a Policy.searcherFactory method, or perhaps even a SearcherPolicyVisitor. Actually, this goes for a few other places in the module too: you have used instanceOf and unchecked casting a bit more extensive to solve problems than what I would have. But as it does not seem to be used in places where it would be a costrly thing to do these comments are mearly about code readability and gut feelings about future problems.

          I'm a bit concerned about the potential loss of data while documents only resides in InstantiatedIndex or RAMDirectory. I think I'd like an option on some sort of transaction log that could be played up in case of a crash. I think the easiset way would be to convert all documents to be pre-analyzed (field.tokenStream) before passing them on to the instantiated writer. I don't know how much resources that might consume, but it would make me feel safer.

          karl

          Show
          Karl Wettin added a comment - Hi Jason, I took an inital look at your code last night. Didn't actually execute anything, just followed method calls around to see what it was up to. My first comment is sort of boring, but there are virtually no javadocs for the core classes such as TransactionSystem, Batch and Index. It would be great if there was a bit at the class level exaplaining what classes they interact with and how. It would also be very helpful if there was method level javadocs for at least the top level commit related logic. One thing that early cought my attention is this method in TransactionSystem: public OceanSearcher getSearcher() throws IOException { Snapshot snapshot = snapshots.getLatestSnapshot(); if (searcherPolicy instanceof SingleThreadSearcherPolicy) { return new OceanSearcher(snapshot); } else { return new OceanMultiThreadSearcher(snapshot, searchThreadPool); } } Am I supposed to call this method for each query (as suggested by the method name) or is this a factory method used to update my own Searcher instance after committing documents to the index (as suggested by the code)? It's not such a big deal, but I personally think you should refactor the instanceOf to a Policy.searcherFactory method, or perhaps even a SearcherPolicyVisitor. Actually, this goes for a few other places in the module too: you have used instanceOf and unchecked casting a bit more extensive to solve problems than what I would have. But as it does not seem to be used in places where it would be a costrly thing to do these comments are mearly about code readability and gut feelings about future problems. I'm a bit concerned about the potential loss of data while documents only resides in InstantiatedIndex or RAMDirectory. I think I'd like an option on some sort of transaction log that could be played up in case of a crash. I think the easiset way would be to convert all documents to be pre-analyzed (field.tokenStream) before passing them on to the instantiated writer. I don't know how much resources that might consume, but it would make me feel safer. karl
          Hide
          Jason Rutherglen added a comment -

          Hi Karl,

          Thanks for taking a look at the code! Yes the methods need javadoc, I was waiting to see if I had settled on them, and because I started building new code on top I guess the methods have settled so I need to add javadoc to them.

          If you are using TransactionSystem then the getSearcher method would be called for each query. I have developed OceanDatabase which makes the searching transparent and implements optimistic concurrency (version number stored in the document). I believe most systems will want to use OceanDatabase, however the raw TransactionSystem which is more like IndexWriter will be left as well. I have been working on OceanDatabase and have neglected the javadocs of TransactionSystem.

          I modeled the searcherPolicy instanceof code on the MergeScheduler type of system where there is a marker interface that the subclasses implement. I don't mind changing it, or if you want to you can as well. I considered it a minor detail though and admittedly did not spend much time on it. You are welcome to change it.

          The transaction log is replayed on a restart of the system. It repopulates a RamIndex (uses RAMDirectory) on startup based on the max snapshot id of the existing indexes, and replays the transaction log from there.

          I looked at converting documents to a token stream, the problem is, if the field is stored, it creates redundant storage of the data in the transaction log. Ultimately I could not find anything to be gained from storing a token stream. Also if it was converted, what would happen with stored fields? The issue with replaying the document later though is not having the Analyzer. In the distributed object code patch LUCENE-1336 I made Analyzer Serializable. I think it's best to serialize the Analyzer, or create a small database of serialized analyzers that can be called upon during the transaction log recovery process. Because I am not entirely sure about the ramifications of serializing the Analyzer, for example, how much data a serialized Analyzer may have. Perhaps other have some ideas or feedback about serializing analyzers.

          In conclusion, I'll add more javadocs. Please feel free to ask more questions!

          Jason

          Show
          Jason Rutherglen added a comment - Hi Karl, Thanks for taking a look at the code! Yes the methods need javadoc, I was waiting to see if I had settled on them, and because I started building new code on top I guess the methods have settled so I need to add javadoc to them. If you are using TransactionSystem then the getSearcher method would be called for each query. I have developed OceanDatabase which makes the searching transparent and implements optimistic concurrency (version number stored in the document). I believe most systems will want to use OceanDatabase, however the raw TransactionSystem which is more like IndexWriter will be left as well. I have been working on OceanDatabase and have neglected the javadocs of TransactionSystem. I modeled the searcherPolicy instanceof code on the MergeScheduler type of system where there is a marker interface that the subclasses implement. I don't mind changing it, or if you want to you can as well. I considered it a minor detail though and admittedly did not spend much time on it. You are welcome to change it. The transaction log is replayed on a restart of the system. It repopulates a RamIndex (uses RAMDirectory) on startup based on the max snapshot id of the existing indexes, and replays the transaction log from there. I looked at converting documents to a token stream, the problem is, if the field is stored, it creates redundant storage of the data in the transaction log. Ultimately I could not find anything to be gained from storing a token stream. Also if it was converted, what would happen with stored fields? The issue with replaying the document later though is not having the Analyzer. In the distributed object code patch LUCENE-1336 I made Analyzer Serializable. I think it's best to serialize the Analyzer, or create a small database of serialized analyzers that can be called upon during the transaction log recovery process. Because I am not entirely sure about the ramifications of serializing the Analyzer, for example, how much data a serialized Analyzer may have. Perhaps other have some ideas or feedback about serializing analyzers. In conclusion, I'll add more javadocs. Please feel free to ask more questions! Jason
          Hide
          Jason Rutherglen added a comment -

          Is there a good place to place the javadocs on the Apache website once they are more complete?

          Show
          Jason Rutherglen added a comment - Is there a good place to place the javadocs on the Apache website once they are more complete?
          Hide
          Jason Rutherglen added a comment -

          LUCENE-1313.patch

          Added javadocs. Still needs the LUCENE-1314 completed which will be divided into multiple patches.

          Show
          Jason Rutherglen added a comment - LUCENE-1313 .patch Added javadocs. Still needs the LUCENE-1314 completed which will be divided into multiple patches.
          Hide
          Jason Rutherglen added a comment - - edited

          The patch includes RealtimeIndex a basic class for performing atomic
          transactional realtime indexing and search. A single thread
          periodically flushes to disk the ram index. It relies on
          LUCENE-1516.

          We need to benchmark this, specifically 1) realtime w/ramdir
          transaction 2) realtime w/queued documents transaction 3) normal
          indexing. Realtime w/ramdir encodes the transaction to a
          RAMDirectory which is added to the RAM writer using
          IW.addIndexesNoOptimize. Option 1 may be slower than option 2,
          however if the system is replicating it may be the only option?

          Long term I believe we need to implement searching over the
          IndexWriter ram buffer (if possible). However I am not sure how
          option 1 and replication would work with it?

          Show
          Jason Rutherglen added a comment - - edited The patch includes RealtimeIndex a basic class for performing atomic transactional realtime indexing and search. A single thread periodically flushes to disk the ram index. It relies on LUCENE-1516 . We need to benchmark this, specifically 1) realtime w/ramdir transaction 2) realtime w/queued documents transaction 3) normal indexing. Realtime w/ramdir encodes the transaction to a RAMDirectory which is added to the RAM writer using IW.addIndexesNoOptimize. Option 1 may be slower than option 2, however if the system is replicating it may be the only option? Long term I believe we need to implement searching over the IndexWriter ram buffer (if possible). However I am not sure how option 1 and replication would work with it?
          Hide
          Michael McCandless added a comment -

          Jason, your last patch looks like it's taking the "flush first to RAM Dir" approach I just described as the next step (on the java-dev thread), right? Which is great!

          So this has no external dependencies, right? And it simply layers on top of LUCENE-1516.

          I'd be very interested to compare (benchmark) this approach vs solely LUCENE-1516.

          Could we change this class so that instead of taking a Transaction object, holding adds & deletes, it simply mirrors IndexWriter's API? Ie, I'd like to decouple the performance optimization of "let's flush small segments ithrough a RAMDir first" from the transactional semantics of "I process a transaction atomically, and lock out other thread's transactions". Ie, the transactional restriction could/should layer on top of this performance optimization for near-realtime search?

          Show
          Michael McCandless added a comment - Jason, your last patch looks like it's taking the "flush first to RAM Dir" approach I just described as the next step (on the java-dev thread), right? Which is great! So this has no external dependencies, right? And it simply layers on top of LUCENE-1516 . I'd be very interested to compare (benchmark) this approach vs solely LUCENE-1516 . Could we change this class so that instead of taking a Transaction object, holding adds & deletes, it simply mirrors IndexWriter's API? Ie, I'd like to decouple the performance optimization of "let's flush small segments ithrough a RAMDir first" from the transactional semantics of "I process a transaction atomically, and lock out other thread's transactions". Ie, the transactional restriction could/should layer on top of this performance optimization for near-realtime search?
          Hide
          Jason Rutherglen added a comment -

          So this has no external dependencies, right?

          Yes.

          I'd be very interested to compare (benchmark) this approach
          vs solely LUCENE-1516.

          Is the .alg using the NearRealtimeReader from LUCENE-1516 our
          best measure of realtime performance?

          the transactional restriction could/should layer on
          top of this performance optimization for near-realtime search?

          The transactional system should be able to support both methods.
          Perhaps a non-locking setting would allow the same RealtimeIndex
          class support both modes of operation?

          Show
          Jason Rutherglen added a comment - So this has no external dependencies, right? Yes. I'd be very interested to compare (benchmark) this approach vs solely LUCENE-1516 . Is the .alg using the NearRealtimeReader from LUCENE-1516 our best measure of realtime performance? the transactional restriction could/should layer on top of this performance optimization for near-realtime search? The transactional system should be able to support both methods. Perhaps a non-locking setting would allow the same RealtimeIndex class support both modes of operation?
          Hide
          Jason Rutherglen added a comment -

          We'll need to integrate the RAM based indexer into IndexWriter
          to carry over the deletes to the ram index while it's copied to
          disk. This is similar to IndexWriter.commitMergedDeletes
          carrying deletes over at the segment reader level based on a
          comparison of the current reader and the cloned reader.
          Otherwise there's redundant deletions to the disk index using
          IW.deleteDocuments which can be unnecessarily expensive. To make
          external we would need to do the delete by doc id genealogy.

          Show
          Jason Rutherglen added a comment - We'll need to integrate the RAM based indexer into IndexWriter to carry over the deletes to the ram index while it's copied to disk. This is similar to IndexWriter.commitMergedDeletes carrying deletes over at the segment reader level based on a comparison of the current reader and the cloned reader. Otherwise there's redundant deletions to the disk index using IW.deleteDocuments which can be unnecessarily expensive. To make external we would need to do the delete by doc id genealogy.
          Hide
          Michael McCandless added a comment -

          > I'd be very interested to compare (benchmark) this approach
          > vs solely LUCENE-1516.

          Is the .alg using the NearRealtimeReader from LUCENE-1516 our
          best measure of realtime performance?

          So far, I think so? You get to set an update rate (delete + add) docs, eg 50 docs/sec, and a pause time between NRT reopens.

          Still, it's synthetic. If you guys (LinkedIn) have a way to fold in some realism into the test, that'd be great, if only "our app ingests at X docs(MB)/sec and reopens the NRT reader X times per second" to set our ballback.

          > the transactional restriction could/should layer on
          > top of this performance optimization for near-realtime search?

          The transactional system should be able to support both methods.
          Perhaps a non-locking setting would allow the same RealtimeIndex
          class support both modes of operation?

          Sorry, what are both "modes" of operation?

          I think there are two different "layers" here – first layer optimizes NRT by flushing small segments to RAMDir first. This seems generally useful and in theory has no impact to the API IndexWriter exposes (it's "merely" an internal optimization). The second layer adds this new Transaction object, such that N adds/deletes/commit/re-open NRT reader can be done atomically wrt other pending Transaction objects.

          We'll need to integrate the RAM based indexer into IndexWriter
          to carry over the deletes to the ram index while it's copied to
          disk. This is similar to IndexWriter.commitMergedDeletes
          carrying deletes over at the segment reader level based on a
          comparison of the current reader and the cloned reader.
          Otherwise there's redundant deletions to the disk index using
          IW.deleteDocuments which can be unnecessarily expensive. To make
          external we would need to do the delete by doc id genealogy.

          Right, I think the RAMDir optimization would go directly into IW, if we can separate it out from Transaction. It could naturally derive from the existing RAMBufferSizeMB, ie if NRT forces a flush, so long as its tiny, put it into the local RAMDir instead of the actual Dir, then "deduct" that size from the allowed budget of DW's ram usage. When RAMDIr + DW exceeds RAMBufferSizeMB, we then merge all of RAMDir's segments into a "real" segment in the directory.

          Show
          Michael McCandless added a comment - > I'd be very interested to compare (benchmark) this approach > vs solely LUCENE-1516 . Is the .alg using the NearRealtimeReader from LUCENE-1516 our best measure of realtime performance? So far, I think so? You get to set an update rate (delete + add) docs, eg 50 docs/sec, and a pause time between NRT reopens. Still, it's synthetic. If you guys (LinkedIn) have a way to fold in some realism into the test, that'd be great, if only "our app ingests at X docs(MB)/sec and reopens the NRT reader X times per second" to set our ballback. > the transactional restriction could/should layer on > top of this performance optimization for near-realtime search? The transactional system should be able to support both methods. Perhaps a non-locking setting would allow the same RealtimeIndex class support both modes of operation? Sorry, what are both "modes" of operation? I think there are two different "layers" here – first layer optimizes NRT by flushing small segments to RAMDir first. This seems generally useful and in theory has no impact to the API IndexWriter exposes (it's "merely" an internal optimization). The second layer adds this new Transaction object, such that N adds/deletes/commit/re-open NRT reader can be done atomically wrt other pending Transaction objects. We'll need to integrate the RAM based indexer into IndexWriter to carry over the deletes to the ram index while it's copied to disk. This is similar to IndexWriter.commitMergedDeletes carrying deletes over at the segment reader level based on a comparison of the current reader and the cloned reader. Otherwise there's redundant deletions to the disk index using IW.deleteDocuments which can be unnecessarily expensive. To make external we would need to do the delete by doc id genealogy. Right, I think the RAMDir optimization would go directly into IW, if we can separate it out from Transaction. It could naturally derive from the existing RAMBufferSizeMB, ie if NRT forces a flush, so long as its tiny, put it into the local RAMDir instead of the actual Dir, then "deduct" that size from the allowed budget of DW's ram usage. When RAMDIr + DW exceeds RAMBufferSizeMB, we then merge all of RAMDir's segments into a "real" segment in the directory.
          Hide
          Jason Rutherglen added a comment -

          Latest realtime code, transactions are removed.

          • Needs to be benchmarked
          • There could be concurrency issues around deletes that occur
            while directories are being flushed to disk.
          • It's Java JARed to include the files and directory structure.
            The patch relies on LUCENE-1516 which if included would make the
            changes incomprehensible
          Show
          Jason Rutherglen added a comment - Latest realtime code, transactions are removed. Needs to be benchmarked There could be concurrency issues around deletes that occur while directories are being flushed to disk. It's Java JARed to include the files and directory structure. The patch relies on LUCENE-1516 which if included would make the changes incomprehensible
          Hide
          Jason Rutherglen added a comment -

          Still, it's synthetic. If you guys (LinkedIn) have a way
          to fold in some realism into the test, that'd be great, if only
          "our app ingests at X docs(MB)/sec and reopens the NRT reader X
          times per second" to set our ballback.

          The test we need to progress to is running the indexing side
          endlessly while also reopening every X seconds, then
          concurrently running searches. This way we can play with a bunch
          of settings (mergescheduler threads, merge factors, max merge
          docs, etc), use the python code to generate a dozen cases,
          execute them and find out what seems optimal for our corpus.
          It's a bit of work but probably the only way each Lucene user
          can conclusively say they have the optimal settings needed for
          their system. Usually there is a baseline QPS that is desired,
          where the reopen delay may be increased to accommodate a lack of
          QPS.

          The ram dir portion of the NRT indexing increases in speed when
          more threads are allocated but those compete with search
          threads, another issue to keep in mind.

          It might be good to add some default charting to
          contrib/benchmark?

          Show
          Jason Rutherglen added a comment - Still, it's synthetic. If you guys (LinkedIn) have a way to fold in some realism into the test, that'd be great, if only "our app ingests at X docs(MB)/sec and reopens the NRT reader X times per second" to set our ballback. The test we need to progress to is running the indexing side endlessly while also reopening every X seconds, then concurrently running searches. This way we can play with a bunch of settings (mergescheduler threads, merge factors, max merge docs, etc), use the python code to generate a dozen cases, execute them and find out what seems optimal for our corpus. It's a bit of work but probably the only way each Lucene user can conclusively say they have the optimal settings needed for their system. Usually there is a baseline QPS that is desired, where the reopen delay may be increased to accommodate a lack of QPS. The ram dir portion of the NRT indexing increases in speed when more threads are allocated but those compete with search threads, another issue to keep in mind. It might be good to add some default charting to contrib/benchmark?
          Hide
          Michael McCandless added a comment -

          The test we need to progress to is running the indexing side
          endlessly while also reopening every X seconds, then
          concurrently running searches

          Do you have a sense of what we'd need to add to contrib/benchmark to make this test possible? LUCENE-1516 takes the first baby step (adds a "NearRealtimeReaderTask").

          Usually there is a baseline QPS that is desired,
          where the reopen delay may be increased to accommodate a lack of
          QPS.

          Right – that's the point I made on java-dev about the "freedom" we have wrt NRT's performance.

          The ram dir portion of the NRT indexing increases in speed when
          more threads are allocated but those compete with search
          threads, another issue to keep in mind.

          Well, single threaded indexing speed is also improved by using RAM dir. Ie the use of RAM dir is orthogonal to the app's use of threads for indexing?

          It might be good to add some default charting to
          contrib/benchmark?

          I've switched to Google's visualization API (http://code.google.com/apis/visualization/) which is a delight (they have a simple-to-use Python wrapper). It'd be awesome to somehow get simple charting folded into benchmark... maybe start w/ shear data export (as tab/comma delimited line file), and then have a separate step that slurps that data in and makes a [Google vis] chart.

          Show
          Michael McCandless added a comment - The test we need to progress to is running the indexing side endlessly while also reopening every X seconds, then concurrently running searches Do you have a sense of what we'd need to add to contrib/benchmark to make this test possible? LUCENE-1516 takes the first baby step (adds a "NearRealtimeReaderTask"). Usually there is a baseline QPS that is desired, where the reopen delay may be increased to accommodate a lack of QPS. Right – that's the point I made on java-dev about the "freedom" we have wrt NRT's performance. The ram dir portion of the NRT indexing increases in speed when more threads are allocated but those compete with search threads, another issue to keep in mind. Well, single threaded indexing speed is also improved by using RAM dir. Ie the use of RAM dir is orthogonal to the app's use of threads for indexing? It might be good to add some default charting to contrib/benchmark? I've switched to Google's visualization API ( http://code.google.com/apis/visualization/ ) which is a delight (they have a simple-to-use Python wrapper). It'd be awesome to somehow get simple charting folded into benchmark... maybe start w/ shear data export (as tab/comma delimited line file), and then have a separate step that slurps that data in and makes a [Google vis] chart.
          Hide
          Jason Rutherglen added a comment -

          I added an IndexWriter.getRAMIndex method that returns a
          RAMIndex object that can be updated and flushed to the
          underlying writer. I think this is better than adding more
          methods to IndexWriter and it separates out the logic of the RAM
          based near realtime index and the rest of IW.

          Package protected IW.addIndexesNoOptimize(DirectoryIndexReader[]
          readers) is added which is used by RAMIndex.flush. I thought
          this functionality could work for LUCENE-1589 as a public
          method, however because of the way IndexWriter performs merges
          using segment infos, handling generic IndexReader classes (which
          may not use segmentinfos) would then be difficult in the
          addIndexesNoOptimize case.

          I think RAMIndex.flush to the underlying writer is not
          synchronized. If the IW is using ConcurrentMergeScheduler then
          the heavy lifting is performed in the background and so should
          not delay adding more documents to the RAMIndex.

          IW.getReader returns the normal IW reader and the RAMIndex
          reader if there is one.

          The RAMIndex writer can be obtained and modified directly as
          opposed to duplicating the setter methods of IndexWriter such as
          setMergeScheduler.

          Show
          Jason Rutherglen added a comment - I added an IndexWriter.getRAMIndex method that returns a RAMIndex object that can be updated and flushed to the underlying writer. I think this is better than adding more methods to IndexWriter and it separates out the logic of the RAM based near realtime index and the rest of IW. Package protected IW.addIndexesNoOptimize(DirectoryIndexReader[] readers) is added which is used by RAMIndex.flush. I thought this functionality could work for LUCENE-1589 as a public method, however because of the way IndexWriter performs merges using segment infos, handling generic IndexReader classes (which may not use segmentinfos) would then be difficult in the addIndexesNoOptimize case. I think RAMIndex.flush to the underlying writer is not synchronized. If the IW is using ConcurrentMergeScheduler then the heavy lifting is performed in the background and so should not delay adding more documents to the RAMIndex. IW.getReader returns the normal IW reader and the RAMIndex reader if there is one. The RAMIndex writer can be obtained and modified directly as opposed to duplicating the setter methods of IndexWriter such as setMergeScheduler.
          Hide
          Jason Rutherglen added a comment -
          • The RAMIndex deletes approach changed to be like IndexWriter.
            The deletes are queued in lists, then applied on RI.flush.
          • There is redundancy between IW.delete* and RI.delete*, perhaps
            we don't need RI.delete*?
          • We need more multithreaded tests, probably based on
            TestIndexWriter to see if we can trigger issues in regards to
            deletes that occur while RI is calling IW.addIndexesNoOptimize.
          • If RI.delete* is removed, do we need a separate RAMIndex class
            to add documents to or is there a more transparent way for NRT
            ramdir to work? Perhaps we can add an IW.flushToRamDir (whereas
            IW.flush writes to the IW directory) method that flushes the
            rambuffer to the RAMIndex? Some of the the issues are around
            swapping out the RAMDir once it's segments are flushed to IW. If
            we took this approach would we need a IW.getReaderRAM method
            that instead of flushing to disk flushes to the ramdir? The
            other problem with the IW.flushToRamDir system is the loss of
            concurrency where a large rambuffer may be flushing to disk
            while the user really wants to small incremental NRT RI based
            updates at the same time.
          Show
          Jason Rutherglen added a comment - The RAMIndex deletes approach changed to be like IndexWriter. The deletes are queued in lists, then applied on RI.flush. There is redundancy between IW.delete* and RI.delete*, perhaps we don't need RI.delete*? We need more multithreaded tests, probably based on TestIndexWriter to see if we can trigger issues in regards to deletes that occur while RI is calling IW.addIndexesNoOptimize. If RI.delete* is removed, do we need a separate RAMIndex class to add documents to or is there a more transparent way for NRT ramdir to work? Perhaps we can add an IW.flushToRamDir (whereas IW.flush writes to the IW directory) method that flushes the rambuffer to the RAMIndex? Some of the the issues are around swapping out the RAMDir once it's segments are flushed to IW. If we took this approach would we need a IW.getReaderRAM method that instead of flushing to disk flushes to the ramdir? The other problem with the IW.flushToRamDir system is the loss of concurrency where a large rambuffer may be flushing to disk while the user really wants to small incremental NRT RI based updates at the same time.
          Hide
          Jason Rutherglen added a comment -

          For this patch I'm debating whether to add a package protected
          IndexWriter.addIndexWriter method. The problem is, the RAMIndex
          blocks on the write to disk during IW.addIndexesNoOptimize which
          if we're using ConcurrentMergeScheduler shouldn't happen?
          Meaning in this proposed solution, if segments keep on piling up
          in RAMIndex, we simply move them over to the disk IW which will
          in the background take care of merging them away and to disk.

          I don't think it's necessary to immediately write ram segments
          to disk (like the current patch does), instead it's possible to
          simply copy segments over from the incoming IW, leave them in
          RAM and they can be merged to disk as necessary? Then on
          IW.flush any segmentinfo(s) that are not from the current
          directory can be flushed to disk?

          Just thinking out loud about this.

          Show
          Jason Rutherglen added a comment - For this patch I'm debating whether to add a package protected IndexWriter.addIndexWriter method. The problem is, the RAMIndex blocks on the write to disk during IW.addIndexesNoOptimize which if we're using ConcurrentMergeScheduler shouldn't happen? Meaning in this proposed solution, if segments keep on piling up in RAMIndex, we simply move them over to the disk IW which will in the background take care of merging them away and to disk. I don't think it's necessary to immediately write ram segments to disk (like the current patch does), instead it's possible to simply copy segments over from the incoming IW, leave them in RAM and they can be merged to disk as necessary? Then on IW.flush any segmentinfo(s) that are not from the current directory can be flushed to disk? Just thinking out loud about this.
          Hide
          Michael McCandless added a comment -

          I don't think it's necessary to immediately write ram segments to disk

          I agree: it should be fine from IndexWriter's standpoint if some
          segments live in a private RAMDir and others live in the "real" dir.

          In fact, early versions of LUCENE-843 did exactly this: IW's RAM
          buffer is not as efficient as a written segment, and so you can gain
          some RAM efficiency by flushing first to RAM and then merging to disk.

          I think we could adopt a simple criteria: you flush the new segment to
          the RAM Dir if net RAM used is < maxRamBufferSizeMB. This way no
          further configuration is needed. On auto-flush triggering you then
          must take into account the RAM usage by this RAM Dir. On commit,
          these RAM segments must be migrated to the real dir (preferably by
          forcing a merge, somehow).

          A near realtime reader would also happily mix "real" Dir and RAMDir
          SegmentReaders.

          This should work well I think, and should not require a separate
          RAMIndex class, and won't block things when the RAM segments are
          migrated to disk by CMS.

          Show
          Michael McCandless added a comment - I don't think it's necessary to immediately write ram segments to disk I agree: it should be fine from IndexWriter's standpoint if some segments live in a private RAMDir and others live in the "real" dir. In fact, early versions of LUCENE-843 did exactly this: IW's RAM buffer is not as efficient as a written segment, and so you can gain some RAM efficiency by flushing first to RAM and then merging to disk. I think we could adopt a simple criteria: you flush the new segment to the RAM Dir if net RAM used is < maxRamBufferSizeMB. This way no further configuration is needed. On auto-flush triggering you then must take into account the RAM usage by this RAM Dir. On commit, these RAM segments must be migrated to the real dir (preferably by forcing a merge, somehow). A near realtime reader would also happily mix "real" Dir and RAMDir SegmentReaders. This should work well I think, and should not require a separate RAMIndex class, and won't block things when the RAM segments are migrated to disk by CMS.
          Hide
          Jason Rutherglen added a comment -

          I think we could adopt a simple criteria: you flush the
          new segment to the RAM Dir if net RAM used is <
          maxRamBufferSizeMB. This way no further configuration is needed.
          On auto-flush triggering you then must take into account the RAM
          usage by this RAM Dir.

          So we're ok with the blocking that occurs when the ram buffer is
          flushed to the ramdir?

          On commit, these RAM segments must be migrated to the
          real dir (preferably by forcing a merge, somehow).

          This is pretty much like resolveExternalSegments which would be
          called in prepareCommit? This could make calls to commit much
          more time consuming. It may be confusing to the user why
          IW.flush doesn't copy the ram segments to disk.

          A near realtime reader would also happily mix "real" Dir
          and RAMDir SegmentReaders.

          Agreed, however the IW.getReader MultiSegmentReader removes
          readers from another directory so we'd need to add a new
          attribute to segmentinfo that marks it as ok for inclusion in
          the MSR?

          Show
          Jason Rutherglen added a comment - I think we could adopt a simple criteria: you flush the new segment to the RAM Dir if net RAM used is < maxRamBufferSizeMB. This way no further configuration is needed. On auto-flush triggering you then must take into account the RAM usage by this RAM Dir. So we're ok with the blocking that occurs when the ram buffer is flushed to the ramdir? On commit, these RAM segments must be migrated to the real dir (preferably by forcing a merge, somehow). This is pretty much like resolveExternalSegments which would be called in prepareCommit? This could make calls to commit much more time consuming. It may be confusing to the user why IW.flush doesn't copy the ram segments to disk. A near realtime reader would also happily mix "real" Dir and RAMDir SegmentReaders. Agreed, however the IW.getReader MultiSegmentReader removes readers from another directory so we'd need to add a new attribute to segmentinfo that marks it as ok for inclusion in the MSR?
          Hide
          Michael McCandless added a comment -

          So we're ok with the blocking that occurs when the ram buffer is
          flushed to the ramdir?

          Well... we don't have a choice (unless/until we implement IndexReader impl to directly search the RAM buffer). Still, this should be a good improvement over the blocking when flushing to a real dir.

          This is pretty much like resolveExternalSegments which would be
          called in prepareCommit? This could make calls to commit much
          more time consuming. It may be confusing to the user why
          IW.flush doesn't copy the ram segments to disk.

          Similar... the difference is I'd prefer to do a merge of the RAM segments vs the straight one-for-one copy that resolveExternalSegments does.

          commit would only become more time consuming in the NRT case? IE we'd only flush-to-RAMdir if it's getReader that's forcing the flush? In which case, I think it's fine that commit gets more costly. Also, I wouldn't expect it to be much more costly: we are doing an in-memory merge of N segments, writing one segment to the "real" directory. Vs writing each tiny segment as a real one. In fact, commit could get cheaper (when compared to not making this change) since there are fewer new files to fsync.

          Agreed, however the IW.getReader MultiSegmentReader removes
          readers from another directory so we'd need to add a new
          attribute to segmentinfo that marks it as ok for inclusion in
          the MSR?

          Or, fix that filtering to also accept IndexWriter's RAMDir.

          Show
          Michael McCandless added a comment - So we're ok with the blocking that occurs when the ram buffer is flushed to the ramdir? Well... we don't have a choice (unless/until we implement IndexReader impl to directly search the RAM buffer). Still, this should be a good improvement over the blocking when flushing to a real dir. This is pretty much like resolveExternalSegments which would be called in prepareCommit? This could make calls to commit much more time consuming. It may be confusing to the user why IW.flush doesn't copy the ram segments to disk. Similar... the difference is I'd prefer to do a merge of the RAM segments vs the straight one-for-one copy that resolveExternalSegments does. commit would only become more time consuming in the NRT case? IE we'd only flush-to-RAMdir if it's getReader that's forcing the flush? In which case, I think it's fine that commit gets more costly. Also, I wouldn't expect it to be much more costly: we are doing an in-memory merge of N segments, writing one segment to the "real" directory. Vs writing each tiny segment as a real one. In fact, commit could get cheaper (when compared to not making this change) since there are fewer new files to fsync. Agreed, however the IW.getReader MultiSegmentReader removes readers from another directory so we'd need to add a new attribute to segmentinfo that marks it as ok for inclusion in the MSR? Or, fix that filtering to also accept IndexWriter's RAMDir.
          Hide
          Jason Rutherglen added a comment -

          I'm confused as to how we make DocumentsWriter switch from
          writing to disk vs the ramdir? It seems like a fairly major
          change to the system? One that's hard to switch later on after
          IW is instantiated? Perhaps the IW.addWriter method is easier in
          this regard?

          the difference is I'd prefer to do a merge of the RAM
          segments vs the straight one-for-one copy that
          resolveExternalSegments does.

          Yeah I implemented it this way in the IW.addWriter code. I agree
          it's better for IW.commit to copy all the ramdir segments to one
          disk segment.

          I started working on the IW.addWriter(IndexWriter, boolean
          removeFrom) where removeFrom removes the segments that have been
          copied to the destination writer from the source writer. This
          method gets around the issue of blocking because potentially
          several writers could concurrently be copied to the destination
          writer. The only issue at this point is how the destination
          writer obtains segmentreaders from source readers when they're
          in the other writers' pool? Maybe the SegmentInfo can have a
          reference to the writer it originated in? That way we can easily
          access the right reader pool when we need it?

          Show
          Jason Rutherglen added a comment - I'm confused as to how we make DocumentsWriter switch from writing to disk vs the ramdir? It seems like a fairly major change to the system? One that's hard to switch later on after IW is instantiated? Perhaps the IW.addWriter method is easier in this regard? the difference is I'd prefer to do a merge of the RAM segments vs the straight one-for-one copy that resolveExternalSegments does. Yeah I implemented it this way in the IW.addWriter code. I agree it's better for IW.commit to copy all the ramdir segments to one disk segment. I started working on the IW.addWriter(IndexWriter, boolean removeFrom) where removeFrom removes the segments that have been copied to the destination writer from the source writer. This method gets around the issue of blocking because potentially several writers could concurrently be copied to the destination writer. The only issue at this point is how the destination writer obtains segmentreaders from source readers when they're in the other writers' pool? Maybe the SegmentInfo can have a reference to the writer it originated in? That way we can easily access the right reader pool when we need it?
          Hide
          Michael McCandless added a comment -

          I'm confused as to how we make DocumentsWriter switch from
          writing to disk vs the ramdir? It seems like a fairly major
          change to the system? One that's hard to switch later on after
          IW is instantiated? Perhaps the IW.addWriter method is easier in
          this regard?

          When we create SegmentWriteState (which is supposed to contain all
          details needed to tell DW how/where to write the segment), we'd set
          its directory to the RAMDir? That ought to be all that's needed
          (though, it's possible some places use a private copy of the original
          directory, which we should fix). DW should care less which Directory
          the segment is written to...

          > the difference is I'd prefer to do a merge of the RAM
          > segments vs the straight one-for-one copy that
          > resolveExternalSegments does.

          Yeah I implemented it this way in the IW.addWriter code. I agree
          it's better for IW.commit to copy all the ramdir segments to one
          disk segment.

          OK. Maybe we modify resolveExternalSegments to accept a "doMerge"?

          I started working on the IW.addWriter(IndexWriter, boolean
          removeFrom) where removeFrom removes the segments that have been
          copied to the destination writer from the source writer. This
          method gets around the issue of blocking because potentially
          several writers could concurrently be copied to the destination
          writer. The only issue at this point is how the destination
          writer obtains segmentreaders from source readers when they're
          in the other writers' pool? Maybe the SegmentInfo can have a
          reference to the writer it originated in? That way we can easily
          access the right reader pool when we need it?

          I don't think we need two writers? I think one writer, sometimes
          flushing to RAMDir, is a clean solution?

          Show
          Michael McCandless added a comment - I'm confused as to how we make DocumentsWriter switch from writing to disk vs the ramdir? It seems like a fairly major change to the system? One that's hard to switch later on after IW is instantiated? Perhaps the IW.addWriter method is easier in this regard? When we create SegmentWriteState (which is supposed to contain all details needed to tell DW how/where to write the segment), we'd set its directory to the RAMDir? That ought to be all that's needed (though, it's possible some places use a private copy of the original directory, which we should fix). DW should care less which Directory the segment is written to... > the difference is I'd prefer to do a merge of the RAM > segments vs the straight one-for-one copy that > resolveExternalSegments does. Yeah I implemented it this way in the IW.addWriter code. I agree it's better for IW.commit to copy all the ramdir segments to one disk segment. OK. Maybe we modify resolveExternalSegments to accept a "doMerge"? I started working on the IW.addWriter(IndexWriter, boolean removeFrom) where removeFrom removes the segments that have been copied to the destination writer from the source writer. This method gets around the issue of blocking because potentially several writers could concurrently be copied to the destination writer. The only issue at this point is how the destination writer obtains segmentreaders from source readers when they're in the other writers' pool? Maybe the SegmentInfo can have a reference to the writer it originated in? That way we can easily access the right reader pool when we need it? I don't think we need two writers? I think one writer, sometimes flushing to RAMDir, is a clean solution?
          Hide
          Michael McCandless added a comment -

          One separate optimization we should make with NRT is to not close the doc store (stored fields, term vector) files when flushing for an NRT reader.

          We do close them now, which then makes merging quite a bit more costly.

          The trickiness with this optimization is we'd need to be able to somehow share an IndexInput & IndexOutput; or, perhaps we can open an IndexInput even though an IndexOutput has the same file open (Windows may prevent this, though I think I've seen that it will in fact allow it).

          Once we do that optimization, then with this RAMDir optimization we should try to have the doc store files punch straight through to the real directory, ie bypass the RAMDir. The doc stores are space consuming, and since with autoCommit=false we can bypass merging them, it makes no sense to store them in the RAMDir.

          We should probably do this optimization as a "phase 2", after this one.

          Show
          Michael McCandless added a comment - One separate optimization we should make with NRT is to not close the doc store (stored fields, term vector) files when flushing for an NRT reader. We do close them now, which then makes merging quite a bit more costly. The trickiness with this optimization is we'd need to be able to somehow share an IndexInput & IndexOutput; or, perhaps we can open an IndexInput even though an IndexOutput has the same file open (Windows may prevent this, though I think I've seen that it will in fact allow it). Once we do that optimization, then with this RAMDir optimization we should try to have the doc store files punch straight through to the real directory, ie bypass the RAMDir. The doc stores are space consuming, and since with autoCommit=false we can bypass merging them, it makes no sense to store them in the RAMDir. We should probably do this optimization as a "phase 2", after this one.
          Hide
          Jason Rutherglen added a comment -

          When we create SegmentWriteState (which is supposed to
          contain all details needed to tell DW how/where to write the
          segment), we'd set its directory to the RAMDir? That ought to be
          all that's needed (though, it's possible some places use a
          private copy of the original directory, which we should fix). DW
          should care less which Directory the segment is written to...

          Agreed that DW can write the segment to the RAMDir. I started
          coding along these lines however what do we do about the RAMDir
          merging? This is why I was thinking we'll need a separate IW?
          Otherwise the ram segments (if they are treated the same as disk
          segments) would quickly be merged to disk? Or we have two
          separate merging paths?

          If we have a disk IW and ram IW, I'm not sure how the docstores
          to disk part would work though I'm sure there's some way to do
          it.

          modify resolveExternalSegments to accept a "doMerge"?

          Sounds good.

          Show
          Jason Rutherglen added a comment - When we create SegmentWriteState (which is supposed to contain all details needed to tell DW how/where to write the segment), we'd set its directory to the RAMDir? That ought to be all that's needed (though, it's possible some places use a private copy of the original directory, which we should fix). DW should care less which Directory the segment is written to... Agreed that DW can write the segment to the RAMDir. I started coding along these lines however what do we do about the RAMDir merging? This is why I was thinking we'll need a separate IW? Otherwise the ram segments (if they are treated the same as disk segments) would quickly be merged to disk? Or we have two separate merging paths? If we have a disk IW and ram IW, I'm not sure how the docstores to disk part would work though I'm sure there's some way to do it. modify resolveExternalSegments to accept a "doMerge"? Sounds good.
          Hide
          Jason Rutherglen added a comment -

          we should make with NRT is to not close the doc store
          (stored fields, term vector) files when flushing for an NRT
          reader.

          Agreed, I think this feature is a must otherwise we're doing
          unnecessary in ram merging.

          we'd need to be able to somehow share an IndexInput &
          IndexOutput; or, perhaps we can open an IndexInput even though
          an IndexOutput

          I ran into problems with this before, I was trying to reuse
          Directory to write a transaction log. It seemed theoretically
          doable however it didn't work in practice. It could have been
          the seeking and replacing but I don't remember. FSIndexOutput
          uses a writeable RAF and FSIndexInput is read only why would
          there be an issue?

          Show
          Jason Rutherglen added a comment - we should make with NRT is to not close the doc store (stored fields, term vector) files when flushing for an NRT reader. Agreed, I think this feature is a must otherwise we're doing unnecessary in ram merging. we'd need to be able to somehow share an IndexInput & IndexOutput; or, perhaps we can open an IndexInput even though an IndexOutput I ran into problems with this before, I was trying to reuse Directory to write a transaction log. It seemed theoretically doable however it didn't work in practice. It could have been the seeking and replacing but I don't remember. FSIndexOutput uses a writeable RAF and FSIndexInput is read only why would there be an issue?
          Hide
          Jason Rutherglen added a comment -

          doc store files punch straight through to the real
          directory

          To implement this functionality in parallel (and perhaps make
          the overall patch cleaner), writing doc stores directly to a
          separate directory can be a different patch? There can be an
          option IW.setDocStoresDirectory(Directory) that the patch
          implements? Then some unit tests that are separate from the near
          realtime portion.

          Show
          Jason Rutherglen added a comment - doc store files punch straight through to the real directory To implement this functionality in parallel (and perhaps make the overall patch cleaner), writing doc stores directly to a separate directory can be a different patch? There can be an option IW.setDocStoresDirectory(Directory) that the patch implements? Then some unit tests that are separate from the near realtime portion.
          Hide
          Michael McCandless added a comment -

          Agreed that DW can write the segment to the RAMDir. I started
          coding along these lines however what do we do about the RAMDir
          merging? This is why I was thinking we'll need a separate IW?
          Otherwise the ram segments (if they are treated the same as disk
          segments) would quickly be merged to disk? Or we have two
          separate merging paths?

          Hmm, right. We could exclude RAMDir segments from consideration by
          MergePolicy? Alternatively, we could "expect" the MergePolicy to
          recognize this and be smart about choosing merges (ie don't mix
          merges)?

          EG we do in fact want some merging of the RAM segments if they get too
          numerous (since that will impact search performance).

          > we should make with NRT is to not close the doc store
          > (stored fields, term vector) files when flushing for an NRT
          > reader.

          Agreed, I think this feature is a must otherwise we're doing
          unnecessary in ram merging.

          OK, let's do this as a separate issue/optimization for NRT. There are
          two separate parts to it:

          • Ability to store doc stores in "real" directory (looks like you
            opened LUCENE-1618 for this part).
          • Ability to "share" IndexOutput & IndexInput

          I ran into problems with this before, I was trying to reuse
          Directory to write a transaction log. It seemed theoretically
          doable however it didn't work in practice. It could have been
          the seeking and replacing but I don't remember. FSIndexOutput
          uses a writeable RAF and FSIndexInput is read only why would
          there be an issue?

          Hmm... seems like we need to investigate further. We could either
          "ask" an IndexOutput for its IndexInput (sharing the underlying RAF),
          or try to separately open an IndexInput (which may not work on
          Windows).

          To implement this functionality in parallel (and perhaps make
          the overall patch cleaner), writing doc stores directly to a
          separate directory can be a different patch? There can be an
          option IW.setDocStoresDirectory(Directory) that the patch
          implements? Then some unit tests that are separate from the near
          realtime portion.

          Yes, separate issue (LUCENE-1618).

          Show
          Michael McCandless added a comment - Agreed that DW can write the segment to the RAMDir. I started coding along these lines however what do we do about the RAMDir merging? This is why I was thinking we'll need a separate IW? Otherwise the ram segments (if they are treated the same as disk segments) would quickly be merged to disk? Or we have two separate merging paths? Hmm, right. We could exclude RAMDir segments from consideration by MergePolicy? Alternatively, we could "expect" the MergePolicy to recognize this and be smart about choosing merges (ie don't mix merges)? EG we do in fact want some merging of the RAM segments if they get too numerous (since that will impact search performance). > we should make with NRT is to not close the doc store > (stored fields, term vector) files when flushing for an NRT > reader. Agreed, I think this feature is a must otherwise we're doing unnecessary in ram merging. OK, let's do this as a separate issue/optimization for NRT. There are two separate parts to it: Ability to store doc stores in "real" directory (looks like you opened LUCENE-1618 for this part). Ability to "share" IndexOutput & IndexInput I ran into problems with this before, I was trying to reuse Directory to write a transaction log. It seemed theoretically doable however it didn't work in practice. It could have been the seeking and replacing but I don't remember. FSIndexOutput uses a writeable RAF and FSIndexInput is read only why would there be an issue? Hmm... seems like we need to investigate further. We could either "ask" an IndexOutput for its IndexInput (sharing the underlying RAF), or try to separately open an IndexInput (which may not work on Windows). To implement this functionality in parallel (and perhaps make the overall patch cleaner), writing doc stores directly to a separate directory can be a different patch? There can be an option IW.setDocStoresDirectory(Directory) that the patch implements? Then some unit tests that are separate from the near realtime portion. Yes, separate issue ( LUCENE-1618 ).
          Hide
          Michael McCandless added a comment -

          So let's leave this issue focused on sometimes using RAMDir for newly created segments.

          Show
          Michael McCandless added a comment - So let's leave this issue focused on sometimes using RAMDir for newly created segments.
          Hide
          Jason Rutherglen added a comment -

          We could exclude RAMDir segments from consideration by
          MergePolicy? Alternatively, we could "expect" the MergePolicy to
          recognize this and be smart about choosing merges (ie don't mix
          merges)?

          Is this over complicating things? Sometimes we want a mixture of
          RAMDir segments and FSDir segments to merge (when we've decided
          we have too much in ram), sometimes we don't (when we just want
          the ram segments to merge). I'm still a little confused as to
          why having a wrapper class that manages a disk writer and a ram
          writer isn't cleaner?

          Show
          Jason Rutherglen added a comment - We could exclude RAMDir segments from consideration by MergePolicy? Alternatively, we could "expect" the MergePolicy to recognize this and be smart about choosing merges (ie don't mix merges)? Is this over complicating things? Sometimes we want a mixture of RAMDir segments and FSDir segments to merge (when we've decided we have too much in ram), sometimes we don't (when we just want the ram segments to merge). I'm still a little confused as to why having a wrapper class that manages a disk writer and a ram writer isn't cleaner?
          Hide
          Michael McCandless added a comment -

          Sometimes we want a mixture of
          RAMDir segments and FSDir segments to merge (when we've decided
          we have too much in ram),

          I don't think we want to mix RAM & disk merging?

          EG when RAM is full, we want to quickly flush it to disk as a single
          segment. Merging with disk segments only makes that flush slower?

          I'm still a little confused as to
          why having a wrapper class that manages a disk writer and a ram
          writer isn't cleaner?

          This is functionally the same as not mixing RAM vs disk merging,
          right (ie just as "clean")?

          Show
          Michael McCandless added a comment - Sometimes we want a mixture of RAMDir segments and FSDir segments to merge (when we've decided we have too much in ram), I don't think we want to mix RAM & disk merging? EG when RAM is full, we want to quickly flush it to disk as a single segment. Merging with disk segments only makes that flush slower? I'm still a little confused as to why having a wrapper class that manages a disk writer and a ram writer isn't cleaner? This is functionally the same as not mixing RAM vs disk merging, right (ie just as "clean")?
          Hide
          Michael McCandless added a comment -

          Yonik raised a good question on LUCENE-1618, which is what gains do we really expect to see by using RAMDir for the tiny recently flushed segments?

          It would be nice if we could approximately measure this before putting more work into this issue – if the gains are not "decent" this optimization may not be worthwhile.

          Of course, we are talking about 100s of milliseconds for the turnaround time to add docs & open an NRT reader, so if the time for writing/opening many tiny files in RAMDir vs FSDir differs by say 10s of msecs then we should pursue this. We should also consider that the IO system may very well be quite busy (doing merge(s), backups, etc.) and that'd make it slower to have to create tiny files.

          A simpler optimization might be to allow using CFS for tiny files (even when CFS is turned off), but built the CFS in RAM (ie, write tiny files first to RAMFiles, then make the CFS file on disk). That might get most of the gains since the FSDir sees only one file created per tiny segment, not N.

          Show
          Michael McCandless added a comment - Yonik raised a good question on LUCENE-1618 , which is what gains do we really expect to see by using RAMDir for the tiny recently flushed segments? It would be nice if we could approximately measure this before putting more work into this issue – if the gains are not "decent" this optimization may not be worthwhile. Of course, we are talking about 100s of milliseconds for the turnaround time to add docs & open an NRT reader, so if the time for writing/opening many tiny files in RAMDir vs FSDir differs by say 10s of msecs then we should pursue this. We should also consider that the IO system may very well be quite busy (doing merge(s), backups, etc.) and that'd make it slower to have to create tiny files. A simpler optimization might be to allow using CFS for tiny files (even when CFS is turned off), but built the CFS in RAM (ie, write tiny files first to RAMFiles, then make the CFS file on disk). That might get most of the gains since the FSDir sees only one file created per tiny segment, not N.
          Hide
          Yonik Seeley added a comment -

          Yonik raised a good question on LUCENE-1618, which is what gains do we really expect to see by using RAMDir for the tiny recently flushed segments?

          I raised it more because of the direction the discussion was veering (write through caching to a RAMDirectory, and RAMDirectory being faster to search). I do believe that RAMDirectory can probably improve NRT, but it would be due to avoiding waiting for file open/write/close/open/read (as Mike also said)... and not any difference during IndexSearcher.search(), which should be irrelevant due to the relative size differences of the RAMDirectory and the FSDirectory. Small file creation speeds will also be heavily dependent on the exact file system used.

          Show
          Yonik Seeley added a comment - Yonik raised a good question on LUCENE-1618 , which is what gains do we really expect to see by using RAMDir for the tiny recently flushed segments? I raised it more because of the direction the discussion was veering (write through caching to a RAMDirectory, and RAMDirectory being faster to search ). I do believe that RAMDirectory can probably improve NRT, but it would be due to avoiding waiting for file open/write/close/open/read (as Mike also said)... and not any difference during IndexSearcher.search(), which should be irrelevant due to the relative size differences of the RAMDirectory and the FSDirectory. Small file creation speeds will also be heavily dependent on the exact file system used.
          Hide
          Jason Rutherglen added a comment -

          EG when RAM is full, we want to quickly flush it to disk
          as a single segment. Merging with disk segments only makes that
          flush slower?

          I assume it's ok for the IW.mergescheduler to be used which may
          not immediately perform the merge to disk (in the case of
          ConcurrentMergeScheduler)? When implementing using
          addIndexesNoOptimize (which blocks) I realized we probably don't
          want blocking to occur because that means shutting down the
          updates.

          Also a random thought, it seems like ConcurrentMergeScheduler
          works great for RAMDir merging, how does it compare with
          SerialMS on an FSDirectory? It seems like it shouldn'y be too much
          faster given the IO sequential access bottleneck?

          Show
          Jason Rutherglen added a comment - EG when RAM is full, we want to quickly flush it to disk as a single segment. Merging with disk segments only makes that flush slower? I assume it's ok for the IW.mergescheduler to be used which may not immediately perform the merge to disk (in the case of ConcurrentMergeScheduler)? When implementing using addIndexesNoOptimize (which blocks) I realized we probably don't want blocking to occur because that means shutting down the updates. Also a random thought, it seems like ConcurrentMergeScheduler works great for RAMDir merging, how does it compare with SerialMS on an FSDirectory? It seems like it shouldn'y be too much faster given the IO sequential access bottleneck?
          Hide
          Michael McCandless added a comment -

          I assume it's ok for the IW.mergescheduler to be used which may
          not immediately perform the merge to disk (in the case of
          ConcurrentMergeScheduler)?

          Only if we "accept" requiring MergePolicy to be aware that some
          segments are in RAMDir and some are in the "real" Dir and to "act
          accordingly", ie 1) don't mix the dirs when merging, 2) when RAM is
          "full" merge every single RAM segment into a single "real Dir" segment
          (requires IW to provide exposure on how much RAM DW's buffer is
          currently consuming), 3) properly "maintain" the RAM segments (ie,
          merge RAM -> RAM somehow) so that searchers don't search too many RAM
          segments.

          I think this approach is probably best: you're right that allowing CMS
          to manage these RAM segments is nice since it'll happen in the BG and
          will not block updates.

          It does mean, though, that the RAM usage semantics of IW is no longer
          so "crisp" as flushing today ("once RAM is full, stop world & flush it
          to disk, then resume") but I think that's acceptable and perhaps
          preferable since world is no longer stopped to flush RAM -> disk.

          Though one trickiness is... if a large RAM -> RAM merge takes place,
          we temporarily double the RAM consumption. I think MergePolicy simply
          shouldn't do that. Ie at not point should it be merging a very large
          %tg of the RAM segments. It should instead merge RAM -> disk.

          This'd also mean advanced users that implement their own MergePolicy
          must realize when IW is used with NRT reader that additional smarts is
          recommended wrt

          When implementing using
          addIndexesNoOptimize (which blocks) I realized we probably don't
          want blocking to occur because that means shutting down the
          updates.

          Right, this is one of the strong reasons to do the "internal" approach
          vs "external" one.

          Also a random thought, it seems like ConcurrentMergeScheduler
          works great for RAMDir merging, how does it compare with
          SerialMS on an FSDirectory? It seems like it shouldn'y be too much
          faster given the IO sequential access bottleneck?

          By far the biggest win of CMS over SMS is in the first merge, because
          it does not block the further addition of docs. Thus an app can
          continue indexing into RAM buffer (consuming CPU & RAM resources)
          while a BG thread consumes RAM + IO resources. This is very much a
          win.

          Beyond the first merge...in theory, modern IO systems have concurrency
          (eg the NCQ in a single SATA drive) so you should "gain" by having
          several threads performing IO at once. The OS & hard drives attempt
          to re-order the request in a more optimal way (like an elevator,
          sweeping floors). I haven't explictly tested this with Lucene...

          I believe SSDs handle concurrent requests very well since under the
          hood most of them are multi-channel basically RAID0 devices (eg Intel
          X25M has 10 channels).

          Show
          Michael McCandless added a comment - I assume it's ok for the IW.mergescheduler to be used which may not immediately perform the merge to disk (in the case of ConcurrentMergeScheduler)? Only if we "accept" requiring MergePolicy to be aware that some segments are in RAMDir and some are in the "real" Dir and to "act accordingly", ie 1) don't mix the dirs when merging, 2) when RAM is "full" merge every single RAM segment into a single "real Dir" segment (requires IW to provide exposure on how much RAM DW's buffer is currently consuming), 3) properly "maintain" the RAM segments (ie, merge RAM -> RAM somehow) so that searchers don't search too many RAM segments. I think this approach is probably best: you're right that allowing CMS to manage these RAM segments is nice since it'll happen in the BG and will not block updates. It does mean, though, that the RAM usage semantics of IW is no longer so "crisp" as flushing today ("once RAM is full, stop world & flush it to disk, then resume") but I think that's acceptable and perhaps preferable since world is no longer stopped to flush RAM -> disk. Though one trickiness is... if a large RAM -> RAM merge takes place, we temporarily double the RAM consumption. I think MergePolicy simply shouldn't do that. Ie at not point should it be merging a very large %tg of the RAM segments. It should instead merge RAM -> disk. This'd also mean advanced users that implement their own MergePolicy must realize when IW is used with NRT reader that additional smarts is recommended wrt When implementing using addIndexesNoOptimize (which blocks) I realized we probably don't want blocking to occur because that means shutting down the updates. Right, this is one of the strong reasons to do the "internal" approach vs "external" one. Also a random thought, it seems like ConcurrentMergeScheduler works great for RAMDir merging, how does it compare with SerialMS on an FSDirectory? It seems like it shouldn'y be too much faster given the IO sequential access bottleneck? By far the biggest win of CMS over SMS is in the first merge, because it does not block the further addition of docs. Thus an app can continue indexing into RAM buffer (consuming CPU & RAM resources) while a BG thread consumes RAM + IO resources. This is very much a win. Beyond the first merge...in theory, modern IO systems have concurrency (eg the NCQ in a single SATA drive) so you should "gain" by having several threads performing IO at once. The OS & hard drives attempt to re-order the request in a more optimal way (like an elevator, sweeping floors). I haven't explictly tested this with Lucene... I believe SSDs handle concurrent requests very well since under the hood most of them are multi-channel basically RAID0 devices (eg Intel X25M has 10 channels).
          Hide
          Jason Rutherglen added a comment - - edited

          A merge policy may be more optimal for ram segments vs disk
          segments. Perhaps the best way to make this clean is to keep the
          ram merge policy and real dir merge policies different? That way
          we don't merge policy implementations don't need to worry about
          ram and non-ram dir cases.

          Perhaps an IW.updatePendingRamMerges method should be added that
          handles this separately? Does the ram dir ever need to worry
          about things like maxNumSegmentsOptimize and optimize?

          Right, this is one of the strong reasons to do the
          "internal" approach vs "external" one.

          I think having the ram merge policy should cover the reasons I
          had for having a separate ram writer. Although the IW.addWriter
          method I implemented would not have blocked, but I don't think
          it's necessary now if we have a separate ram merge policy.

          Show
          Jason Rutherglen added a comment - - edited A merge policy may be more optimal for ram segments vs disk segments. Perhaps the best way to make this clean is to keep the ram merge policy and real dir merge policies different? That way we don't merge policy implementations don't need to worry about ram and non-ram dir cases. Perhaps an IW.updatePendingRamMerges method should be added that handles this separately? Does the ram dir ever need to worry about things like maxNumSegmentsOptimize and optimize? Right, this is one of the strong reasons to do the "internal" approach vs "external" one. I think having the ram merge policy should cover the reasons I had for having a separate ram writer. Although the IW.addWriter method I implemented would not have blocked, but I don't think it's necessary now if we have a separate ram merge policy.
          Hide
          Michael McCandless added a comment -

          Perhaps the best way to make this clean is to keep the
          ram merge policy and real dir merge policies different? That way
          we don't merge policy implementations don't need to worry about
          ram and non-ram dir cases.

          OK tentatively this feels like a good approach. Would you re-use
          MergePolicy, or make a new RAMMergePolicy?

          Would we use the same MergeScheduler to then execute the selected
          merges?

          How would we handle the "it's time to flush some RAM to disk" case?
          Would RAMMergePolicy make that decision?

          Perhaps an IW.updatePendingRamMerges method should be added that handles this separately?

          Yes?

          Does the ram dir ever need to worry about things like maxNumSegmentsOptimize and optimize?

          No?

          I think having the ram merge policy should cover the reasons I
          had for having a separate ram writer. Although the IW.addWriter
          method I implemented would not have blocked, but I don't think
          it's necessary now if we have a separate ram merge policy.

          OK good.

          Show
          Michael McCandless added a comment - Perhaps the best way to make this clean is to keep the ram merge policy and real dir merge policies different? That way we don't merge policy implementations don't need to worry about ram and non-ram dir cases. OK tentatively this feels like a good approach. Would you re-use MergePolicy, or make a new RAMMergePolicy? Would we use the same MergeScheduler to then execute the selected merges? How would we handle the "it's time to flush some RAM to disk" case? Would RAMMergePolicy make that decision? Perhaps an IW.updatePendingRamMerges method should be added that handles this separately? Yes? Does the ram dir ever need to worry about things like maxNumSegmentsOptimize and optimize? No? I think having the ram merge policy should cover the reasons I had for having a separate ram writer. Although the IW.addWriter method I implemented would not have blocked, but I don't think it's necessary now if we have a separate ram merge policy. OK good.
          Hide
          Jason Rutherglen added a comment -

          Would you re-use MergePolicy, or make a new
          RAMMergePolicy?

          MergePolicy is used as is with a special IW method that handles
          merging ram segments for the real directory (which has an issue
          around merging contiguous segments, can that be relaxed in this
          case as I don't understand why this is?)

          The patch is not committable, however I am posting it to show a
          path that seems to work. It includes test cases for merging in
          ram and merging to the real directory.

          • IW.getFlushDirectory is used by internal calls to obtain the
            directory to flush segments to. This is used in DocumentsWriter
            related calls.
          • DocumentsWriter.directory is removed so that methods requiring
            the directory call IW.getFlushDirectory instead.
          • IW.setRAMDirectory sets the ram directory to be used.
          • IW.setRAMMergePolicy sets the merge policy to be used for
            merging segments on the ram dir.
          • In IW.updatePendingMerges totalRamUsed is the size of the ram
            segments + the ram buffer used. If totalRamUsed exceeds the max
            ram buffer size then IW. updatePendingRamMergesToRealDir is
            called.
          • IW. updatePendingRamMergesToRealDir registers a merge of the
            ram segments to the real directory (currently causes a
            non-contiguous segments exception)
          • MergePolicy.OneMerge has a directory attribute used when
            building the merge.info in _mergeInit.
          • Test case includes testMergeInRam, testMergeToDisk,
            testMergeRamExceeded

          There is one error that occurs regularly in testMergeRamExceeded

           MergePolicy selected non-contiguous segments to merge
          (_bo:cx83 _bm:cx4 _bn:cx2 _bl:cx1->_bj _bp:cx1->_bp _bq:cx1->_bp
          _c2:cx1->_c2 _c3:cx1->_c2 _c4:cx1->_c2 vs _5x:c120 _6a:c8
          _6t:c11 _bo:cx83** _bm:cx4** _bn:cx2** _bl:cx1->_bj**
          _bp:cx1->_bp** _bq:cx1->_bp** _c1:c10 _c2:cx1->_c2**
          _c3:cx1->_c2** _c4:cx1->_c2**), which IndexWriter (currently)
          cannot handle 
          Show
          Jason Rutherglen added a comment - Would you re-use MergePolicy, or make a new RAMMergePolicy? MergePolicy is used as is with a special IW method that handles merging ram segments for the real directory (which has an issue around merging contiguous segments, can that be relaxed in this case as I don't understand why this is?) The patch is not committable, however I am posting it to show a path that seems to work. It includes test cases for merging in ram and merging to the real directory. IW.getFlushDirectory is used by internal calls to obtain the directory to flush segments to. This is used in DocumentsWriter related calls. DocumentsWriter.directory is removed so that methods requiring the directory call IW.getFlushDirectory instead. IW.setRAMDirectory sets the ram directory to be used. IW.setRAMMergePolicy sets the merge policy to be used for merging segments on the ram dir. In IW.updatePendingMerges totalRamUsed is the size of the ram segments + the ram buffer used. If totalRamUsed exceeds the max ram buffer size then IW. updatePendingRamMergesToRealDir is called. IW. updatePendingRamMergesToRealDir registers a merge of the ram segments to the real directory (currently causes a non-contiguous segments exception) MergePolicy.OneMerge has a directory attribute used when building the merge.info in _mergeInit. Test case includes testMergeInRam, testMergeToDisk, testMergeRamExceeded There is one error that occurs regularly in testMergeRamExceeded MergePolicy selected non-contiguous segments to merge (_bo:cx83 _bm:cx4 _bn:cx2 _bl:cx1->_bj _bp:cx1->_bp _bq:cx1->_bp _c2:cx1->_c2 _c3:cx1->_c2 _c4:cx1->_c2 vs _5x:c120 _6a:c8 _6t:c11 _bo:cx83** _bm:cx4** _bn:cx2** _bl:cx1->_bj** _bp:cx1->_bp** _bq:cx1->_bp** _c1:c10 _c2:cx1->_c2** _c3:cx1->_c2** _c4:cx1->_c2**), which IndexWriter (currently) cannot handle
          Hide
          Jason Rutherglen added a comment -
          • Ok, fixed the ensureContiguousMerge exception by asking the
            mergePolicy (not ramMergePolicy) to evaluate the ram segment
            infos as an optimize to directory. Now all the current tests
            pass.
          • The patch is cleaned up a little, needs more, and further test
            cases.
          • IndexWriter doesn't accept setRAMDirectory anymore, it needs to
            be passed into the IndexWriter constructor. This because we
            can't run the system and the ram dir is changed in the middle of
            an operation.
          Show
          Jason Rutherglen added a comment - Ok, fixed the ensureContiguousMerge exception by asking the mergePolicy (not ramMergePolicy) to evaluate the ram segment infos as an optimize to directory. Now all the current tests pass. The patch is cleaned up a little, needs more, and further test cases. IndexWriter doesn't accept setRAMDirectory anymore, it needs to be passed into the IndexWriter constructor. This because we can't run the system and the ram dir is changed in the middle of an operation.
          Hide
          Jason Rutherglen added a comment -

          Fixed and cleaned up more.

          All tests pass

          Added entry in CHANGES.txt

          I'm going to integrate LUCENE-1618 and test that out as a part of the next patch.

          Show
          Jason Rutherglen added a comment - Fixed and cleaned up more. All tests pass Added entry in CHANGES.txt I'm going to integrate LUCENE-1618 and test that out as a part of the next patch.
          Hide
          Michael McCandless added a comment -

          Patch looks good! Some comments:

          • I don't think the caller should provide the RAMDir... and we
            should have a getter to get it. I think this should be
            under-the-hood. This is simply a nice way for IW to use RAM as
            buffer in the presence of frequent NRT readers being opened.
          • If NRT is never used, the behavior of IW should be unchanged
            (which is not the case w/ this patch I think). RAMDir should be
            created the first time a flush is done due to NRT creation.
          • StoredFieldsWriter & TermVectorsTermsWriter now writes to
            IndexWriter.getFlushDirectory(), which is confusing because that
            method returns the RAMDir if set? Shouldn't this be the opposite?
            (Ie it should flush to IndexWriter.getDirectory()? Or we should
            change getFlushDiretory to NOT return the ramdir?)
          • Why did you need to add synchronized to some of the SegmentInfo
            files methods? (What breaks if you undo that?). The contract
            here is IW protects access to SegmentInfo/s.
          • The MergePolicy needs some smarts when it's dealing w/ RAM. EG it
            should not do a merge of more than XXX% of total RAM usage (should
            flush to the real directory instead).
          • Nothing is calling the new ramOverLimit?
          • Still some noise (MockRAMDir, DocFieldProcessorPerThread, some
            changes in LogMergePolicy)
          Show
          Michael McCandless added a comment - Patch looks good! Some comments: I don't think the caller should provide the RAMDir... and we should have a getter to get it. I think this should be under-the-hood. This is simply a nice way for IW to use RAM as buffer in the presence of frequent NRT readers being opened. If NRT is never used, the behavior of IW should be unchanged (which is not the case w/ this patch I think). RAMDir should be created the first time a flush is done due to NRT creation. StoredFieldsWriter & TermVectorsTermsWriter now writes to IndexWriter.getFlushDirectory(), which is confusing because that method returns the RAMDir if set? Shouldn't this be the opposite? (Ie it should flush to IndexWriter.getDirectory()? Or we should change getFlushDiretory to NOT return the ramdir?) Why did you need to add synchronized to some of the SegmentInfo files methods? (What breaks if you undo that?). The contract here is IW protects access to SegmentInfo/s. The MergePolicy needs some smarts when it's dealing w/ RAM. EG it should not do a merge of more than XXX% of total RAM usage (should flush to the real directory instead). Nothing is calling the new ramOverLimit? Still some noise (MockRAMDir, DocFieldProcessorPerThread, some changes in LogMergePolicy)
          Hide
          Jason Rutherglen added a comment -
          • IndexFileDeleter takes into account the ram directory (which
            when using NRT with the FSD caused files to not be found).
          • FSD is included and writes fdx, fdt, tvx, tvf, tvd extension
            files to the primary directory (which is the same as
            IW.directory). LUCENE-1618 needs to be updated with these
            changes (or we simply include it in this patch as the
            LUCENE-1618 patch is only a couple of files).
          • Removed DocumentsWriter.ramOverLimit
          • I think we need to give the option of a ram mergescheduler
            because the user may want not want the ram merging and disk
            merging to compete for threads. I'm thinking if of the use case
            where NRT is a priority then one may allocate more threads to
            the ram CMS and less to the disk CMS. This also gives us the
            option of trying out more parameters when performing benchmarks
            of NRT.
          • We may want to default the ram mergepolicy to not use compound
            files as it's not useful when using a ram dir?
          • Because FSD uses IW.directory, FSD will list files that
            originated from FSD and from IW.directory, we may want to keep
            track of which files are supposed to be in FSD (from the
            underlying primary dir) and which are not?

          If NRT is never used, the behavior of IW should be
          unchanged (which is not the case w/ this patch I think). RAMDir
          should be created the first time a flush is done due to NRT
          creation.

          In the patch if ramdir is not passed in, the behavior of IW
          remains the same as it is today. You're saying we should have IW
          create the ramdir by default after getReader is called and
          remove the IW ramdir constructor? What if the user has an
          alternative ramdir implementation they want to use?

          StoredFieldsWriter & TermVectorsTermsWriter now writes to
          IndexWriter.getFlushDirectory(), which is confusing because that
          method returns the RAMDir if set? Shouldn't this be the
          opposite? (Ie it should flush to IndexWriter.getDirectory()? Or
          we should change getFlushDiretory to NOT return the
          ramdir?)

          The attached patch uses FileSwitchDirectory, where these files
          are written to the primary directory (IW.directory). So
          getFlushDirectory is ok?

          Why did you need to add synchronized to some of the
          SegmentInfo files methods? (What breaks if you undo that?). The
          contract here is IW protects access to SegmentInfo/s

          SegmentInfo.files was being cleared while sizeInBytes was called
          which resulted in an NPE. The alternative is sync IW in
          IW.size(SegmentInfos) which seems a bit extreme just to obtain
          the size of a segment info?

          The MergePolicy needs some smarts when it's dealing w/
          RAM. EG it should not do a merge of more than XXX% of total RAM
          usage (should flush to the real directory instead)

          Isn't this handled well enough in updatePendingMerges or is
          there more that needs to be done?

          Show
          Jason Rutherglen added a comment - IndexFileDeleter takes into account the ram directory (which when using NRT with the FSD caused files to not be found). FSD is included and writes fdx, fdt, tvx, tvf, tvd extension files to the primary directory (which is the same as IW.directory). LUCENE-1618 needs to be updated with these changes (or we simply include it in this patch as the LUCENE-1618 patch is only a couple of files). Removed DocumentsWriter.ramOverLimit I think we need to give the option of a ram mergescheduler because the user may want not want the ram merging and disk merging to compete for threads. I'm thinking if of the use case where NRT is a priority then one may allocate more threads to the ram CMS and less to the disk CMS. This also gives us the option of trying out more parameters when performing benchmarks of NRT. We may want to default the ram mergepolicy to not use compound files as it's not useful when using a ram dir? Because FSD uses IW.directory, FSD will list files that originated from FSD and from IW.directory, we may want to keep track of which files are supposed to be in FSD (from the underlying primary dir) and which are not? If NRT is never used, the behavior of IW should be unchanged (which is not the case w/ this patch I think). RAMDir should be created the first time a flush is done due to NRT creation. In the patch if ramdir is not passed in, the behavior of IW remains the same as it is today. You're saying we should have IW create the ramdir by default after getReader is called and remove the IW ramdir constructor? What if the user has an alternative ramdir implementation they want to use? StoredFieldsWriter & TermVectorsTermsWriter now writes to IndexWriter.getFlushDirectory(), which is confusing because that method returns the RAMDir if set? Shouldn't this be the opposite? (Ie it should flush to IndexWriter.getDirectory()? Or we should change getFlushDiretory to NOT return the ramdir?) The attached patch uses FileSwitchDirectory, where these files are written to the primary directory (IW.directory). So getFlushDirectory is ok? Why did you need to add synchronized to some of the SegmentInfo files methods? (What breaks if you undo that?). The contract here is IW protects access to SegmentInfo/s SegmentInfo.files was being cleared while sizeInBytes was called which resulted in an NPE. The alternative is sync IW in IW.size(SegmentInfos) which seems a bit extreme just to obtain the size of a segment info? The MergePolicy needs some smarts when it's dealing w/ RAM. EG it should not do a merge of more than XXX% of total RAM usage (should flush to the real directory instead) Isn't this handled well enough in updatePendingMerges or is there more that needs to be done?
          Hide
          Michael McCandless added a comment -

          IndexFileDeleter takes into account the ram directory (which
          when using NRT with the FSD caused files to not be found).

          I don't like how "deep" the dichotomy of "RAMDir vs FSDir" is being
          pushed. Why can't we push FSD down to all these places (IFD,
          SegmentInfo/s, etc.)?

          FSD is included and writes fdx, fdt, tvx, tvf, tvd extension
          files to the primary directory (which is the same as
          IW.directory). LUCENE-1618 needs to be updated with these
          changes (or we simply include it in this patch as the
          LUCENE-1618 patch is only a couple of files).

          Why did this require changes to FSD?

          I think we need to give the option of a ram mergescheduler
          because the user may want not want the ram merging and disk
          merging to compete for threads. I'm thinking if of the use case
          where NRT is a priority then one may allocate more threads to
          the ram CMS and less to the disk CMS. This also gives us the
          option of trying out more parameters when performing benchmarks
          of NRT.

          I think we're unlikely to gain from more than 1 BG thread for RAM
          merging? But I agree it'd be horrible if CMS blocked RAM merging
          because its allotted threads were tied up merging disk segments.
          Could we simply make the single CMS instance smart enoguh to realize
          that a single RAM merge is allowed to proceed regardless of the thread
          limit?

          We may want to default the ram mergepolicy to not use compound
          files as it's not useful when using a ram dir?

          I think actually hardwire it, not just default. Building CFS in RAM
          makes no sense. Worse, if we allow one to choose to do it we then
          have to fix FSD to understand CFX must go to the dir too, and, we'd
          have to fix IW to not merge in the doc store files when building a
          private CFS. Net/net I think we should not allow CFS for the RAM
          segments.

          On merging to disk it can then respect the user's CFS setting.

          Because FSD uses IW.directory, FSD will list files that
          originated from FSD and from IW.directory, we may want to keep
          track of which files are supposed to be in FSD (from the
          underlying primary dir) and which are not?

          I don't understand what's wrong here?

          > If NRT is never used, the behavior of IW should be
          > unchanged (which is not the case w/ this patch I think). RAMDir
          > should be created the first time a flush is done due to NRT
          > creation.

          In the patch if ramdir is not passed in, the behavior of IW
          remains the same as it is today. You're saying we should have IW
          create the ramdir by default after getReader is called and
          remove the IW ramdir constructor?

          Right. This should be "under the hood".

          What if the user has an alternative ramdir implementation they want to
          use?

          I think I'd rather not open up that option just yet. This really is a
          private optimization to how IW uses RAM. We may want to further
          change/improve how RAM is used.

          Way back when, IW used a RAMDir internally for buffering; then, with
          LUCENE-843 we switched to whole different format (DW's ram
          buffering). Now we are adding back RAMDir for NRT; maybe we'll switch
          its format at some point... or change NRT to directly search DW's
          RAM... etc. How IW uses RAM is very much an internal detail so I'd
          rather not expose it publically.

          [BTW: once we have this machinery online, it's conceivable that we'd
          want to flush to RAMDir even in the non-NRT case. EG, say DW's RAM
          buffer is full and it's time to flush. If it flushes to RAM,
          typically the RAMDir is far more compact than DW's RAM buffer and it
          then still has some more space to work with, before having to flush to
          disk. If we explore this it should be in a new issue (later)...]

          > StoredFieldsWriter & TermVectorsTermsWriter now writes to
          > IndexWriter.getFlushDirectory(), which is confusing because that
          > method returns the RAMDir if set? Shouldn't this be the
          > opposite? (Ie it should flush to IndexWriter.getDirectory()? Or
          > we should change getFlushDiretory to NOT return the
          > ramdir?)

          The attached patch uses FileSwitchDirectory, where these files
          are written to the primary directory (IW.directory). So
          getFlushDirectory is ok?

          OK, though I'd like to simply always use FSD, even if primary &
          secondary are the same dir. All these if's checking for both dirs,
          passing both dirs deep into Lucene's APIs, etc., are spooky.

          > Why did you need to add synchronized to some of the
          > SegmentInfo files methods? (What breaks if you undo that?). The
          > contract here is IW protects access to SegmentInfo/s

          SegmentInfo.files was being cleared while sizeInBytes was called
          which resulted in an NPE. The alternative is sync IW in
          IW.size(SegmentInfos) which seems a bit extreme just to obtain
          the size of a segment info?

          But... why did we have one thread asking for size while another was
          tweaking the SegmentInfo? What leads to that? We need to better
          understand the root cause here.

          The size consumed by the RAM segments should be carefully computed
          (called only in sychchronized(iw) context) and then shared. This value
          changes relatively rarely (on flushing a new segment to ram; on
          applying deletes that include RAM segments; on doing a ram->ram
          merge), but is read frequently (per doc added, to decide whether it's
          time to flush). I think the value should be pushed to DW whenever it
          changes, via synchronized method in DW; and then the existing
          synchronized logic in DW that decides if it's time to "flush after"
          should consult that value. No further synchronizing should be
          necessary.

          Also, this ram size should be used not only for deciding when it's
          time to merge to a disk segment, but also when it's time for DW to
          flush a new segment (which I think your current patch is missing?).

          > The MergePolicy needs some smarts when it's dealing w/ RAM. EG it
          > should not do a merge of more than XXX% of total RAM usage (should
          > flush to the real directory instead).

          Isn't this handled well enough in updatePendingMerges or is
          there more that needs to be done?

          There is more that needs to be done, because MergePolicy must
          conditionalize its logic based on RAM vs FS. Ie, if our RAM buffer is
          32 MB, and there are say 31 MB of RAM segments that suddenly need
          merging (becuase we just flushed the 10th RAM segment), we should not
          do a RAM -> RAM merge at that point (because 31./32. = very high pctg
          of our net RAM buffer). Instead we should force RAM -> disk at that
          point, even though technically RAM is not yet full.

          Ooh: maybe a better approach is to disallow the merge if the expected
          peak RAM usage will exceed our buffer. I like this better. So if
          budget is 32 MB, and net RAM used (segments + DW) is say 22, we have a
          10 MB "budget", so we are allowed to select merges that total to < 10
          MB.

          Show
          Michael McCandless added a comment - IndexFileDeleter takes into account the ram directory (which when using NRT with the FSD caused files to not be found). I don't like how "deep" the dichotomy of "RAMDir vs FSDir" is being pushed. Why can't we push FSD down to all these places (IFD, SegmentInfo/s, etc.)? FSD is included and writes fdx, fdt, tvx, tvf, tvd extension files to the primary directory (which is the same as IW.directory). LUCENE-1618 needs to be updated with these changes (or we simply include it in this patch as the LUCENE-1618 patch is only a couple of files). Why did this require changes to FSD? I think we need to give the option of a ram mergescheduler because the user may want not want the ram merging and disk merging to compete for threads. I'm thinking if of the use case where NRT is a priority then one may allocate more threads to the ram CMS and less to the disk CMS. This also gives us the option of trying out more parameters when performing benchmarks of NRT. I think we're unlikely to gain from more than 1 BG thread for RAM merging? But I agree it'd be horrible if CMS blocked RAM merging because its allotted threads were tied up merging disk segments. Could we simply make the single CMS instance smart enoguh to realize that a single RAM merge is allowed to proceed regardless of the thread limit? We may want to default the ram mergepolicy to not use compound files as it's not useful when using a ram dir? I think actually hardwire it, not just default. Building CFS in RAM makes no sense. Worse, if we allow one to choose to do it we then have to fix FSD to understand CFX must go to the dir too, and, we'd have to fix IW to not merge in the doc store files when building a private CFS. Net/net I think we should not allow CFS for the RAM segments. On merging to disk it can then respect the user's CFS setting. Because FSD uses IW.directory, FSD will list files that originated from FSD and from IW.directory, we may want to keep track of which files are supposed to be in FSD (from the underlying primary dir) and which are not? I don't understand what's wrong here? > If NRT is never used, the behavior of IW should be > unchanged (which is not the case w/ this patch I think). RAMDir > should be created the first time a flush is done due to NRT > creation. In the patch if ramdir is not passed in, the behavior of IW remains the same as it is today. You're saying we should have IW create the ramdir by default after getReader is called and remove the IW ramdir constructor? Right. This should be "under the hood". What if the user has an alternative ramdir implementation they want to use? I think I'd rather not open up that option just yet. This really is a private optimization to how IW uses RAM. We may want to further change/improve how RAM is used. Way back when, IW used a RAMDir internally for buffering; then, with LUCENE-843 we switched to whole different format (DW's ram buffering). Now we are adding back RAMDir for NRT; maybe we'll switch its format at some point... or change NRT to directly search DW's RAM... etc. How IW uses RAM is very much an internal detail so I'd rather not expose it publically. [BTW: once we have this machinery online, it's conceivable that we'd want to flush to RAMDir even in the non-NRT case. EG, say DW's RAM buffer is full and it's time to flush. If it flushes to RAM, typically the RAMDir is far more compact than DW's RAM buffer and it then still has some more space to work with, before having to flush to disk. If we explore this it should be in a new issue (later)...] > StoredFieldsWriter & TermVectorsTermsWriter now writes to > IndexWriter.getFlushDirectory(), which is confusing because that > method returns the RAMDir if set? Shouldn't this be the > opposite? (Ie it should flush to IndexWriter.getDirectory()? Or > we should change getFlushDiretory to NOT return the > ramdir?) The attached patch uses FileSwitchDirectory, where these files are written to the primary directory (IW.directory). So getFlushDirectory is ok? OK, though I'd like to simply always use FSD, even if primary & secondary are the same dir. All these if's checking for both dirs, passing both dirs deep into Lucene's APIs, etc., are spooky. > Why did you need to add synchronized to some of the > SegmentInfo files methods? (What breaks if you undo that?). The > contract here is IW protects access to SegmentInfo/s SegmentInfo.files was being cleared while sizeInBytes was called which resulted in an NPE. The alternative is sync IW in IW.size(SegmentInfos) which seems a bit extreme just to obtain the size of a segment info? But... why did we have one thread asking for size while another was tweaking the SegmentInfo? What leads to that? We need to better understand the root cause here. The size consumed by the RAM segments should be carefully computed (called only in sychchronized(iw) context) and then shared. This value changes relatively rarely (on flushing a new segment to ram; on applying deletes that include RAM segments; on doing a ram->ram merge), but is read frequently (per doc added, to decide whether it's time to flush). I think the value should be pushed to DW whenever it changes, via synchronized method in DW; and then the existing synchronized logic in DW that decides if it's time to "flush after" should consult that value. No further synchronizing should be necessary. Also, this ram size should be used not only for deciding when it's time to merge to a disk segment, but also when it's time for DW to flush a new segment (which I think your current patch is missing?). > The MergePolicy needs some smarts when it's dealing w/ RAM. EG it > should not do a merge of more than XXX% of total RAM usage (should > flush to the real directory instead). Isn't this handled well enough in updatePendingMerges or is there more that needs to be done? There is more that needs to be done, because MergePolicy must conditionalize its logic based on RAM vs FS. Ie, if our RAM buffer is 32 MB, and there are say 31 MB of RAM segments that suddenly need merging (becuase we just flushed the 10th RAM segment), we should not do a RAM -> RAM merge at that point (because 31./32. = very high pctg of our net RAM buffer). Instead we should force RAM -> disk at that point, even though technically RAM is not yet full. Ooh: maybe a better approach is to disallow the merge if the expected peak RAM usage will exceed our buffer. I like this better. So if budget is 32 MB, and net RAM used (segments + DW) is say 22, we have a 10 MB "budget", so we are allowed to select merges that total to < 10 MB.
          Hide
          Jason Rutherglen added a comment -

          I don't like how "deep" the dichotomy of "RAMDir vs
          FSDir"

          Agreed, it's a bit awkward but I don't see another way to do
          this. The good thing is if IW has written some .fdt files to the
          main dir (via FSD), IW crashes, then IW is created again, IFD
          automatically deletes the extraneous .fdt (and other extension)
          files.

          Why can't we push FSD down to all these places (IFD,
          SegmentInfo/s, etc.)?

          Could we simply make the single CMS instance smart enough
          to realize that a single RAM merge is allowed to proceed
          regardless of the thread limit?

          Hmm... I think for benchmarking it would be good to allow
          options as we simply don't know. In the latest patch a ram
          mergescheduler can be set to the IndexWriter.

          have to fix FSD to understand CFX must go to the dir
          too

          I think this is fixed in the patch, where compound files are not
          created in RAM.

          You're saying we should have IW create the ramdir by default
          after getReader is called and remove the IW ramdir constructor?
          Right. This should be "under the hood".

          Ok, this will require some reworking of the patch.

          OK, though I'd like to simply always use FSD, even if
          primary & secondary are the same dir.

          How will always using FSD work? Doesn't it assume writing to two
          different directories?

          this ram size should be used not only for deciding when
          it's time to merge to a disk segment, but also when it's time
          for DW to flush a new segment

          In the new patch this is fixed.

          So if budget is 32 MB, and net RAM used (segments + DW)
          is say 22, we have a 10 MB "budget", so we are allowed to select
          merges that total to < 10 MB.

          One issue is the ram buffer flush doubles the ram used (because
          the segment is flushed as is to the RAM dir). You're saying
          roughly estimate the ram size used on the result of a merge and
          have the merge policy take this into account? This makes sense,
          otherwise we will consistently (if temporarily) exceed the ram
          buffer size. The algorithm is fairly simple? Find segments whose
          total sizes are lower than whatever we have left of the max ram
          buffer size? I have new code, but will rework it a bit to
          include this discussion.

          Show
          Jason Rutherglen added a comment - I don't like how "deep" the dichotomy of "RAMDir vs FSDir" Agreed, it's a bit awkward but I don't see another way to do this. The good thing is if IW has written some .fdt files to the main dir (via FSD), IW crashes, then IW is created again, IFD automatically deletes the extraneous .fdt (and other extension) files. Why can't we push FSD down to all these places (IFD, SegmentInfo/s, etc.)? Could we simply make the single CMS instance smart enough to realize that a single RAM merge is allowed to proceed regardless of the thread limit? Hmm... I think for benchmarking it would be good to allow options as we simply don't know. In the latest patch a ram mergescheduler can be set to the IndexWriter. have to fix FSD to understand CFX must go to the dir too I think this is fixed in the patch, where compound files are not created in RAM. You're saying we should have IW create the ramdir by default after getReader is called and remove the IW ramdir constructor? Right. This should be "under the hood". Ok, this will require some reworking of the patch. OK, though I'd like to simply always use FSD, even if primary & secondary are the same dir. How will always using FSD work? Doesn't it assume writing to two different directories? this ram size should be used not only for deciding when it's time to merge to a disk segment, but also when it's time for DW to flush a new segment In the new patch this is fixed. So if budget is 32 MB, and net RAM used (segments + DW) is say 22, we have a 10 MB "budget", so we are allowed to select merges that total to < 10 MB. One issue is the ram buffer flush doubles the ram used (because the segment is flushed as is to the RAM dir). You're saying roughly estimate the ram size used on the result of a merge and have the merge policy take this into account? This makes sense, otherwise we will consistently (if temporarily) exceed the ram buffer size. The algorithm is fairly simple? Find segments whose total sizes are lower than whatever we have left of the max ram buffer size? I have new code, but will rework it a bit to include this discussion.
          Hide
          Michael McCandless added a comment -

          > OK, though I'd like to simply always use FSD, even if
          > primary & secondary are the same dir.

          How will always using FSD work? Doesn't it assume writing to two
          different directories?

          I think on creating IW the user should state (via new expert ctor)
          that they intend to use it for NRT (say, a new boolean
          "enableNearRealTime").

          Then we could pass IFD either an FSD (when in NRT mode) or the normal
          directory when not in NRT mode. IFD would not longer have to
          duplicate FSD's logic (summing the two dir's listAlls, the
          getDirectoryForFile).

          SegmentInfos.hasExternalSegments, and MultiSegmentReader ctor, should
          be "smart" when they're passed an FSD (probably we should add
          Directory.contains(Directory) method, which by default returns true if
          this.equals(dir), but FSD would override to return true if the
          incoming dir .equals primary & secondary).

          Likewise all the switching in DW to handle two dirs should be rolled
          back (eg you adde DW.fileLength(name, dir1, dir2) that's dup code with
          FSD).

          One issue is the ram buffer flush doubles the ram used (because
          the segment is flushed as is to the RAM dir).

          I think we must keep transient RAM usage below the specified limit, so
          that limits our flushing freedom. Ie, in the NRT case, once DW's RAM
          buffer exceeds half of the allowed remaining RAM budget (ie, the limit
          minus total RAM segments) then we trigger a flush to RAM and then to
          the "real" dir.

          Or... we could flush the new segment directly to the real dir as one
          segment, and merge all prior RAM segments as a separate new segment in
          the main dir, if the free RAM is large enough.

          > this ram size should be used not only for deciding when
          > it's time to merge to a disk segment, but also when it's time
          > for DW to flush a new segment

          In the new patch this is fixed.

          I don't see where this is taken into account? Did you mean to attach
          a new patch?

          Show
          Michael McCandless added a comment - > OK, though I'd like to simply always use FSD, even if > primary & secondary are the same dir. How will always using FSD work? Doesn't it assume writing to two different directories? I think on creating IW the user should state (via new expert ctor) that they intend to use it for NRT (say, a new boolean "enableNearRealTime"). Then we could pass IFD either an FSD (when in NRT mode) or the normal directory when not in NRT mode. IFD would not longer have to duplicate FSD's logic (summing the two dir's listAlls, the getDirectoryForFile). SegmentInfos.hasExternalSegments, and MultiSegmentReader ctor, should be "smart" when they're passed an FSD (probably we should add Directory.contains(Directory) method, which by default returns true if this.equals(dir), but FSD would override to return true if the incoming dir .equals primary & secondary). Likewise all the switching in DW to handle two dirs should be rolled back (eg you adde DW.fileLength(name, dir1, dir2) that's dup code with FSD). One issue is the ram buffer flush doubles the ram used (because the segment is flushed as is to the RAM dir). I think we must keep transient RAM usage below the specified limit, so that limits our flushing freedom. Ie, in the NRT case, once DW's RAM buffer exceeds half of the allowed remaining RAM budget (ie, the limit minus total RAM segments) then we trigger a flush to RAM and then to the "real" dir. Or... we could flush the new segment directly to the real dir as one segment, and merge all prior RAM segments as a separate new segment in the main dir, if the free RAM is large enough. > this ram size should be used not only for deciding when > it's time to merge to a disk segment, but also when it's time > for DW to flush a new segment In the new patch this is fixed. I don't see where this is taken into account? Did you mean to attach a new patch?
          Hide
          Jason Rutherglen added a comment -
          • In DocumentsWriter.balanceRAM if NRT is on the total ram
            consumed is "(numBytesUsed * 2) + writer.getRamDirSize()".
            numBytesUsed is the current consumption of the ram buffer.
            Basically what we flush to ram, we'll consume that much of the
            buffer. This is now taken into account in the bufferIsFull
            calculation.
          • Double dir usage should be factored out.
          • TestIndexWriterRamDir.testFSDirectory fails. It tries to
            simulate a crashing IW. When the IW is created again it should
            delete the old files, for some reason it's not with FSDirectory
            (open file handles on Windows perhaps)

          we could flush the new segment directly to the real dir
          as one segment, and merge all prior RAM segments as a separate
          new segment in the main dir, if the free RAM is large enough.

          Yeah it's unclear what the best policy is here. Do we want to
          have some sort of custom merge policy method/class to take care
          of this so the user can customize it?

          Show
          Jason Rutherglen added a comment - In DocumentsWriter.balanceRAM if NRT is on the total ram consumed is "(numBytesUsed * 2) + writer.getRamDirSize()". numBytesUsed is the current consumption of the ram buffer. Basically what we flush to ram, we'll consume that much of the buffer. This is now taken into account in the bufferIsFull calculation. Double dir usage should be factored out. TestIndexWriterRamDir.testFSDirectory fails. It tries to simulate a crashing IW. When the IW is created again it should delete the old files, for some reason it's not with FSDirectory (open file handles on Windows perhaps) we could flush the new segment directly to the real dir as one segment, and merge all prior RAM segments as a separate new segment in the main dir, if the free RAM is large enough. Yeah it's unclear what the best policy is here. Do we want to have some sort of custom merge policy method/class to take care of this so the user can customize it?
          Hide
          Jason Rutherglen added a comment -

          Did we decide to simply add a boolean param in the ctor to turn
          on NRT instead of relying on getReader. Using getReader could
          cause problems with switching directories midstream.

          Show
          Jason Rutherglen added a comment - Did we decide to simply add a boolean param in the ctor to turn on NRT instead of relying on getReader. Using getReader could cause problems with switching directories midstream.
          Hide
          Michael McCandless added a comment -

          Did we decide to simply add a boolean param in the ctor to turn
          on NRT instead of relying on getReader. Using getReader could
          cause problems with switching directories midstream.

          Yes, let's switch to that.

          Show
          Michael McCandless added a comment - Did we decide to simply add a boolean param in the ctor to turn on NRT instead of relying on getReader. Using getReader could cause problems with switching directories midstream. Yes, let's switch to that.
          Hide
          Michael McCandless added a comment -
          • We shouldn't add FSD.setPrimaryExtensions?
          • The dual directories is continuing to push deeper (when I'm
            wanting to do the reverse). EG, MergeScheduler.getDestinationDirs
            should not be needed?
          • We should no longer need IndexWriter.getFlushDirectory? IE, IW
            once again has a single "Directory" as seen by IFD,
            DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD;
            in the non-NRT case it's the Dir that was passed in (unless, in a
            future issue, we explore using FSD, too, for better performance).
          • We can't up and change CMS.handleMergeException (breaks
            back-compat); can you deprecate old & add a new one that calls old
            one? Let's have the new one take a Throwable and
            MergePolicy.OneMerge?
          • Instead of overriding "equals" (FSD.equals) can you change to
            "Directory.contains"?
          • IW's RAMDir usage still isn't factored in properly. EG
            DW.doBalancRAM is not taking it into account.
          • Furthermore, we can't call writer.getRamDirSize() from
            DW.balanceRAM – that's far too costly. Instead, whenever RAMDir
            changes (deletes are applied, or a new RAM segment is created), we
            must push down to DW that usage with a new synchronized method.
            (I described this above). We should remove
            IW.getRamDirSize()... ie, this size should always be "pushed on
            change", not "polled on read". We change it rarely and read it
            incredibly often.
          • We don't need IW.getRamLogMergePolicy()? Instead, let's ignore
            "MergePolicy.useCompoundFile()" when we are writing the new
            segment to RAMDir? Likewise we should not cast RAMMergePolicy to
            LogMergePolicy in setRAMMergePolicy, nor turn off its CFS there.
          • I still don't think we need a separate RAMMergeScheduler; I think
            CMS should simply always run such merges (ie not block on max
            thread count). IW.getNextMerge can then revert to its former
            self.
          • MergePolicy.OneMerge.segString no longer needs to take a
            Directory (because it now stores a Directory).
          • The mergeRAMSegmentsToDisk shouldn't be fully synchronized, eg
            when doWait is true it should release the lock while merges are
            taking place.
          Show
          Michael McCandless added a comment - We shouldn't add FSD.setPrimaryExtensions? The dual directories is continuing to push deeper (when I'm wanting to do the reverse). EG, MergeScheduler.getDestinationDirs should not be needed? We should no longer need IndexWriter.getFlushDirectory? IE, IW once again has a single "Directory" as seen by IFD, DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD; in the non-NRT case it's the Dir that was passed in (unless, in a future issue, we explore using FSD, too, for better performance). We can't up and change CMS.handleMergeException (breaks back-compat); can you deprecate old & add a new one that calls old one? Let's have the new one take a Throwable and MergePolicy.OneMerge? Instead of overriding "equals" (FSD.equals) can you change to "Directory.contains"? IW's RAMDir usage still isn't factored in properly. EG DW.doBalancRAM is not taking it into account. Furthermore, we can't call writer.getRamDirSize() from DW.balanceRAM – that's far too costly. Instead, whenever RAMDir changes (deletes are applied, or a new RAM segment is created), we must push down to DW that usage with a new synchronized method. (I described this above). We should remove IW.getRamDirSize()... ie, this size should always be "pushed on change", not "polled on read". We change it rarely and read it incredibly often. We don't need IW.getRamLogMergePolicy()? Instead, let's ignore "MergePolicy.useCompoundFile()" when we are writing the new segment to RAMDir? Likewise we should not cast RAMMergePolicy to LogMergePolicy in setRAMMergePolicy, nor turn off its CFS there. I still don't think we need a separate RAMMergeScheduler; I think CMS should simply always run such merges (ie not block on max thread count). IW.getNextMerge can then revert to its former self. MergePolicy.OneMerge.segString no longer needs to take a Directory (because it now stores a Directory). The mergeRAMSegmentsToDisk shouldn't be fully synchronized, eg when doWait is true it should release the lock while merges are taking place.
          Hide
          Jason Rutherglen added a comment -

          RAMDir changes (deletes are applied, or a new RAM segment is
          created), we must push down to DW that usage with a new synchronized
          method.

          Sounds like we create a subclass of RAMDirectory with this
          functionality?

          We don't need IW.getRamLogMergePolicy()?

          Because we don't want the user customizing this?

          We should no longer need IndexWriter.getFlushDirectory? IE, IW
          once again has a single "Directory" as seen by IFD,
          DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD; in
          the non-NRT case it's the Dir that was passed in (unless, in a future
          issue, we explore using FSD, too, for better performance).

          Pass in FSD in the constructor of DocumentsWriter (and others) as
          before?

          I still don't think we need a separate RAMMergeScheduler; I
          think CMS should simply always run such merges (ie not block on max
          thread count). IW.getNextMerge can then revert to its former
          self.

          Where does the thread come from for this if we're using max threads?
          If we allocate one, we're over limit and keeping it around. We'd need
          a more advanced threadpool that elastically grows the thread pool and
          kills threads that are unused over time. With Java 1.5 we can use
          ThreadPoolExecutor. Is a dedicated thread pool something we want to
          go to? Even then we can potentially still max out a given thread pool
          with requests to merge one directory or the other. We'd probably
          still need two separate thread pools.

          MergePolicy.OneMerge.segString no longer needs to take a
          Directory (because it now stores a Directory).

          Yeah, I noticed this, I'll change it. MergeSpecification.segString is
          public and takes a directory that is not required. What to do?

          Show
          Jason Rutherglen added a comment - RAMDir changes (deletes are applied, or a new RAM segment is created), we must push down to DW that usage with a new synchronized method. Sounds like we create a subclass of RAMDirectory with this functionality? We don't need IW.getRamLogMergePolicy()? Because we don't want the user customizing this? We should no longer need IndexWriter.getFlushDirectory? IE, IW once again has a single "Directory" as seen by IFD, DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD; in the non-NRT case it's the Dir that was passed in (unless, in a future issue, we explore using FSD, too, for better performance). Pass in FSD in the constructor of DocumentsWriter (and others) as before? I still don't think we need a separate RAMMergeScheduler; I think CMS should simply always run such merges (ie not block on max thread count). IW.getNextMerge can then revert to its former self. Where does the thread come from for this if we're using max threads? If we allocate one, we're over limit and keeping it around. We'd need a more advanced threadpool that elastically grows the thread pool and kills threads that are unused over time. With Java 1.5 we can use ThreadPoolExecutor. Is a dedicated thread pool something we want to go to? Even then we can potentially still max out a given thread pool with requests to merge one directory or the other. We'd probably still need two separate thread pools. MergePolicy.OneMerge.segString no longer needs to take a Directory (because it now stores a Directory). Yeah, I noticed this, I'll change it. MergeSpecification.segString is public and takes a directory that is not required. What to do?
          Hide
          Jason Rutherglen added a comment -

          The dual directories is continuing to push deeper (when I'm
          wanting to do the reverse). EG, MergeScheduler.getDestinationDirs
          should not be needed?

          If we remove getFlushDirectory, are you saying getDirectory should
          return the FSD if RAM NRT is turned on? This seems counter intuitive
          in that we still need a clear separation of the two directories? The
          user would expect the directory they passed into the ctor to be
          returned?

          getDestinationDirs is used by the ram merge scheduler, which if we
          use a single CMS would go away.

          I'm looking at how to get RAMLogMergePolicy to take into account the
          size of the ram segments it's merging such that they do not total
          beyond the remaining available ram. Looks like we could keep a
          running byte total while it's building the merges and stop once we've
          reached the limit, though I'm not sure how exact this is (will the
          merges be balanced using this system?). It seems like a variation on
          the LogByteSizeMergePolicy however it's unclear whether
          LogDocMergePolicy or LogByteSizeMergePolicy ram merges will perform
          better (does it matter since it's all in ram and we're capping the
          total?)

          Show
          Jason Rutherglen added a comment - The dual directories is continuing to push deeper (when I'm wanting to do the reverse). EG, MergeScheduler.getDestinationDirs should not be needed? If we remove getFlushDirectory, are you saying getDirectory should return the FSD if RAM NRT is turned on? This seems counter intuitive in that we still need a clear separation of the two directories? The user would expect the directory they passed into the ctor to be returned? getDestinationDirs is used by the ram merge scheduler, which if we use a single CMS would go away. I'm looking at how to get RAMLogMergePolicy to take into account the size of the ram segments it's merging such that they do not total beyond the remaining available ram. Looks like we could keep a running byte total while it's building the merges and stop once we've reached the limit, though I'm not sure how exact this is (will the merges be balanced using this system?). It seems like a variation on the LogByteSizeMergePolicy however it's unclear whether LogDocMergePolicy or LogByteSizeMergePolicy ram merges will perform better (does it matter since it's all in ram and we're capping the total?)
          Hide
          Jason Rutherglen added a comment -

          I'm not sure we have the right model yet for deciding when to
          flush the ram buffer and/or ram segments. Perhaps we can simply
          divide the ram buffer size in half, allocating one part to the
          ram buffer, the other to the ram segments. When one exceeds it's
          (rambuffersize/2) allotment, it's flushed to disk. This way if
          the ram buffer size is 32MB, we will always safely flush 16MB to
          disk. The more ram allotted, greater the size of what's flushed
          to disk. We may eventually want to offer an expert method to set
          the ram buffer size and ram dir max size individually.

          Put another way I think we need a balanced upper limit for the
          ram buffer and the NRT ram dir, which seems (to me) to be hard
          to achieve by allowing too much growth at the expensive of the
          other.

          I'd like to stay away from flushing the ram buffer to disk when
          it's below say 20% of the ram buffer size as it seems
          inefficient to do this (because we'll have to do an expensive
          disk merge on it later). On the other hand if the user is not
          calling get reader very often and we're auto flushing at 1/2 the
          ram buffer size, we're short changing ourselves and only
          flushing a segment half the size of what it could be. I suppose
          we could stick with the 1/2 model, only turning it on once ram
          segments are being merged in ram?

          If when merging ram segments (using the specialized
          RAMMergePolicy) we only merge in ram the ones that fit, what do
          we do with the ram segments remaining that need to be flushed to
          disk? What if they are only make up 20% of the total size of the
          ram segments? If we merge the 20% to disk it seems inefficient?

          Show
          Jason Rutherglen added a comment - I'm not sure we have the right model yet for deciding when to flush the ram buffer and/or ram segments. Perhaps we can simply divide the ram buffer size in half, allocating one part to the ram buffer, the other to the ram segments. When one exceeds it's (rambuffersize/2) allotment, it's flushed to disk. This way if the ram buffer size is 32MB, we will always safely flush 16MB to disk. The more ram allotted, greater the size of what's flushed to disk. We may eventually want to offer an expert method to set the ram buffer size and ram dir max size individually. Put another way I think we need a balanced upper limit for the ram buffer and the NRT ram dir, which seems (to me) to be hard to achieve by allowing too much growth at the expensive of the other. I'd like to stay away from flushing the ram buffer to disk when it's below say 20% of the ram buffer size as it seems inefficient to do this (because we'll have to do an expensive disk merge on it later). On the other hand if the user is not calling get reader very often and we're auto flushing at 1/2 the ram buffer size, we're short changing ourselves and only flushing a segment half the size of what it could be. I suppose we could stick with the 1/2 model, only turning it on once ram segments are being merged in ram? If when merging ram segments (using the specialized RAMMergePolicy) we only merge in ram the ones that fit, what do we do with the ram segments remaining that need to be flushed to disk? What if they are only make up 20% of the total size of the ram segments? If we merge the 20% to disk it seems inefficient?
          Hide
          Michael McCandless added a comment -

          > RAMDir changes (deletes are applied, or a new RAM segment is
          > created), we must push down to DW that usage with a new synchronized
          > method.

          Sounds like we create a subclass of RAMDirectory with this
          functionality?

          I don't think that's needed. I think whenever IW makes a change to
          the RAMDir, which is easily tracked, it pushes to DW the new RAMDir
          size.

          > We don't need IW.getRamLogMergePolicy()?

          Because we don't want the user customizing this?

          That, and because it's only used to determine CFS or not, which we've
          turned off for RAMDir.

          > We should no longer need IndexWriter.getFlushDirectory? IE, IW
          > once again has a single "Directory" as seen by IFD,
          > DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD; in
          > the non-NRT case it's the Dir that was passed in (unless, in a future
          > issue, we explore using FSD, too, for better performance).

          Pass in FSD in the constructor of DocumentsWriter (and others) as
          before?

          Right. All these places could care less if they are dealing w/ FSD or
          a "real" dir. They should simply use the Directory API as they
          previously did.

          > I still don't think we need a separate RAMMergeScheduler; I
          > think CMS should simply always run such merges (ie not block on max
          > thread count). IW.getNextMerge can then revert to its former
          > self.

          Where does the thread come from for this if we're using max threads?
          If we allocate one, we're over limit and keeping it around. We'd need
          a more advanced threadpool that elastically grows the thread pool and
          kills threads that are unused over time. With Java 1.5 we can use
          ThreadPoolExecutor. Is a dedicated thread pool something we want to
          go to? Even then we can potentially still max out a given thread pool
          with requests to merge one directory or the other. We'd probably
          still need two separate thread pools.

          The thread is simply launched w/o checking maxThreadCount, if the
          merge is in RAM.

          Right, with JDK 1.5 we can make CMS better about pooling threads.
          Right now it does no long-term pooling (unless another merge happens
          to be needed when a thread finishes its last merge).

          > MergePolicy.OneMerge.segString no longer needs to take a
          > Directory (because it now stores a Directory).

          Yeah, I noticed this, I'll change it. MergeSpecification.segString is
          public and takes a directory that is not required. What to do?

          Do the usual back-compat dance – deprecate it and add the new one.

          > The dual directories is continuing to push deeper (when I'm
          > wanting to do the reverse). EG, MergeScheduler.getDestinationDirs
          > should not be needed?

          If we remove getFlushDirectory, are you saying getDirectory should
          return the FSD if RAM NRT is turned on? This seems counter intuitive
          in that we still need a clear separation of the two directories? The
          user would expect the directory they passed into the ctor to be
          returned?

          I agree, we should leave getDirectory() as is (returns whatever Dir
          was passed in).

          We can keep getFlushDirectory, but it should not have duality inside it
          – it should simply return the FSD (in the NRT case) or the normal
          dir. I don't really like the name getFlushDirectory... but can't
          think of a better one yet.

          Then, nothing outside of IW should ever know there are two directories
          at play. They all simply deal with the one and only Directory that IW
          hands out.

          On the "when to flush to RAM" question... I agree it's tricky. This
          logic belongs in the RAMMergePolicy. That policy needs to be
          empowered to decide if a new flush goes to RAM or disk, to decide when
          to merge all RAM segments to a new disk segment, to be able to check
          if IW is in NRT mode, etc. Probably the RAM merge policy also needs
          control over how much of the RAM buffer it's going to give to DW,
          too. At first the policy should not change the non-NRT case (ie one
          always flushes straight to disk). We can play w/ that in a separate
          issue. Need to think more about the logic...

          Show
          Michael McCandless added a comment - > RAMDir changes (deletes are applied, or a new RAM segment is > created), we must push down to DW that usage with a new synchronized > method. Sounds like we create a subclass of RAMDirectory with this functionality? I don't think that's needed. I think whenever IW makes a change to the RAMDir, which is easily tracked, it pushes to DW the new RAMDir size. > We don't need IW.getRamLogMergePolicy()? Because we don't want the user customizing this? That, and because it's only used to determine CFS or not, which we've turned off for RAMDir. > We should no longer need IndexWriter.getFlushDirectory? IE, IW > once again has a single "Directory" as seen by IFD, > DocFieldProcessorPerThread, etc. In the NRT case, this is an FSD; in > the non-NRT case it's the Dir that was passed in (unless, in a future > issue, we explore using FSD, too, for better performance). Pass in FSD in the constructor of DocumentsWriter (and others) as before? Right. All these places could care less if they are dealing w/ FSD or a "real" dir. They should simply use the Directory API as they previously did. > I still don't think we need a separate RAMMergeScheduler; I > think CMS should simply always run such merges (ie not block on max > thread count). IW.getNextMerge can then revert to its former > self. Where does the thread come from for this if we're using max threads? If we allocate one, we're over limit and keeping it around. We'd need a more advanced threadpool that elastically grows the thread pool and kills threads that are unused over time. With Java 1.5 we can use ThreadPoolExecutor. Is a dedicated thread pool something we want to go to? Even then we can potentially still max out a given thread pool with requests to merge one directory or the other. We'd probably still need two separate thread pools. The thread is simply launched w/o checking maxThreadCount, if the merge is in RAM. Right, with JDK 1.5 we can make CMS better about pooling threads. Right now it does no long-term pooling (unless another merge happens to be needed when a thread finishes its last merge). > MergePolicy.OneMerge.segString no longer needs to take a > Directory (because it now stores a Directory). Yeah, I noticed this, I'll change it. MergeSpecification.segString is public and takes a directory that is not required. What to do? Do the usual back-compat dance – deprecate it and add the new one. > The dual directories is continuing to push deeper (when I'm > wanting to do the reverse). EG, MergeScheduler.getDestinationDirs > should not be needed? If we remove getFlushDirectory, are you saying getDirectory should return the FSD if RAM NRT is turned on? This seems counter intuitive in that we still need a clear separation of the two directories? The user would expect the directory they passed into the ctor to be returned? I agree, we should leave getDirectory() as is (returns whatever Dir was passed in). We can keep getFlushDirectory, but it should not have duality inside it – it should simply return the FSD (in the NRT case) or the normal dir. I don't really like the name getFlushDirectory... but can't think of a better one yet. Then, nothing outside of IW should ever know there are two directories at play. They all simply deal with the one and only Directory that IW hands out. On the "when to flush to RAM" question... I agree it's tricky. This logic belongs in the RAMMergePolicy. That policy needs to be empowered to decide if a new flush goes to RAM or disk, to decide when to merge all RAM segments to a new disk segment, to be able to check if IW is in NRT mode, etc. Probably the RAM merge policy also needs control over how much of the RAM buffer it's going to give to DW, too. At first the policy should not change the non-NRT case (ie one always flushes straight to disk). We can play w/ that in a separate issue. Need to think more about the logic...
          Hide
          Jason Rutherglen added a comment -

          I don't think that's needed. I think whenever IW makes a
          change to the RAMDir, which is easily tracked, it pushes to DW
          the new RAMDir size.

          Because we know the IW.ramdir is a RAMDirectory implementation,
          we can use sizeInBytes? It's synchronized, maybe we want a
          different method that's not? It seems like keeping track of all
          files writes outside ramdir is going to be difficult? For
          example when we do deletes via SegmentReader how would we keep
          track of that?

          That, and because it's only used to determine CFS or not,
          which we've turned off for RAMDir.

          So we let the user set the RAMMergePolicy but not get it?

          The thread is simply launched w/o checking
          maxThreadCount, if the merge is in RAM.

          Hmm... We can't just create threads and let them be garbage
          collected as JVMs tend to throw OOMs with this. If we go down
          this route of a single CMS, maybe we can borrow some code from
          an Apache project that's implemented a threadpool.

          Show
          Jason Rutherglen added a comment - I don't think that's needed. I think whenever IW makes a change to the RAMDir, which is easily tracked, it pushes to DW the new RAMDir size. Because we know the IW.ramdir is a RAMDirectory implementation, we can use sizeInBytes? It's synchronized, maybe we want a different method that's not? It seems like keeping track of all files writes outside ramdir is going to be difficult? For example when we do deletes via SegmentReader how would we keep track of that? That, and because it's only used to determine CFS or not, which we've turned off for RAMDir. So we let the user set the RAMMergePolicy but not get it? The thread is simply launched w/o checking maxThreadCount, if the merge is in RAM. Hmm... We can't just create threads and let them be garbage collected as JVMs tend to throw OOMs with this. If we go down this route of a single CMS, maybe we can borrow some code from an Apache project that's implemented a threadpool.
          Hide
          Michael McCandless added a comment -

          > I don't think that's needed. I think whenever IW makes a
          > change to the RAMDir, which is easily tracked, it pushes to DW
          > the new RAMDir size.

          Because we know the IW.ramdir is a RAMDirectory implementation,
          we can use sizeInBytes? It's synchronized, maybe we want a
          different method that's not? It seems like keeping track of all
          files writes outside ramdir is going to be difficult? For
          example when we do deletes via SegmentReader how would we keep
          track of that?

          We should definitely just use the sizeInBytes() method.

          I'm saying that IW knows when it writes new files to the RAMDir
          (flushing deletes, flushing new segment) and it's only at those times
          that it should call sizeInBytes() and push that value down to DW.

          > That, and because it's only used to determine CFS or not,
          > which we've turned off for RAMDir.

          So we let the user set the RAMMergePolicy but not get it?

          Oh, we should add a getter (getRAMMergePolicy, not getLogMergePolicy)
          for it, but it should return MergePolicy not LogMergePolicy.

          > The thread is simply launched w/o checking
          > maxThreadCount, if the merge is in RAM.

          Hmm... We can't just create threads and let them be garbage
          collected as JVMs tend to throw OOMs with this. If we go down
          this route of a single CMS, maybe we can borrow some code from
          an Apache project that's implemented a threadpool.

          This is how CMS has always been. It launches threads relatively
          rarely – this shouldn't lead to OOMs. One can always subclass CMS if
          this is somehow a problem. Or we could modify CMS to pool its threads
          (as a new issue)?

          Show
          Michael McCandless added a comment - > I don't think that's needed. I think whenever IW makes a > change to the RAMDir, which is easily tracked, it pushes to DW > the new RAMDir size. Because we know the IW.ramdir is a RAMDirectory implementation, we can use sizeInBytes? It's synchronized, maybe we want a different method that's not? It seems like keeping track of all files writes outside ramdir is going to be difficult? For example when we do deletes via SegmentReader how would we keep track of that? We should definitely just use the sizeInBytes() method. I'm saying that IW knows when it writes new files to the RAMDir (flushing deletes, flushing new segment) and it's only at those times that it should call sizeInBytes() and push that value down to DW. > That, and because it's only used to determine CFS or not, > which we've turned off for RAMDir. So we let the user set the RAMMergePolicy but not get it? Oh, we should add a getter (getRAMMergePolicy, not getLogMergePolicy) for it, but it should return MergePolicy not LogMergePolicy. > The thread is simply launched w/o checking > maxThreadCount, if the merge is in RAM. Hmm... We can't just create threads and let them be garbage collected as JVMs tend to throw OOMs with this. If we go down this route of a single CMS, maybe we can borrow some code from an Apache project that's implemented a threadpool. This is how CMS has always been. It launches threads relatively rarely – this shouldn't lead to OOMs. One can always subclass CMS if this is somehow a problem. Or we could modify CMS to pool its threads (as a new issue)?
          Hide
          Jason Rutherglen added a comment -

          In the patch the merge policies are split up which requires some
          of the RAM NRT logic to be in updatePendingMerges.

          One solution is to have a merge policy that manages merging to
          ram and to disk, kind of an overarching merge policy for the
          primary MP and the RAM MP. This would push the logic of ram
          merging and primary dir merging to the meta merge policy which
          would clean up IW from managing ram segs vs. prim segs.

          Does IW.optimize and IW.expungeDeletes operate on the ramdir as
          well (the expungeDeletes javadoc implies calling
          IR.numDeletedDocs will return zero when there are no deletes).

          Show
          Jason Rutherglen added a comment - In the patch the merge policies are split up which requires some of the RAM NRT logic to be in updatePendingMerges. One solution is to have a merge policy that manages merging to ram and to disk, kind of an overarching merge policy for the primary MP and the RAM MP. This would push the logic of ram merging and primary dir merging to the meta merge policy which would clean up IW from managing ram segs vs. prim segs. Does IW.optimize and IW.expungeDeletes operate on the ramdir as well (the expungeDeletes javadoc implies calling IR.numDeletedDocs will return zero when there are no deletes).
          Hide
          Jason Rutherglen added a comment -

          Something in the DocumentsWriter API we may need to change is to
          allow passing a directory through the IndexingChain. In the RAM
          NRT case, which directory we write to can change depending on if
          a ram buffer has exceeded it's maximum available size. If it is
          under half the available ram it will to go the ram dir, if not
          the new segment will be written to disk. For this reason we
          can't simply pass a directory into the constructor of
          DocumentsWriter, nor can we rely on calling
          IW.getFlushDirectory. We should be able to rely on the directory
          in SegmentWriteState?

          Show
          Jason Rutherglen added a comment - Something in the DocumentsWriter API we may need to change is to allow passing a directory through the IndexingChain. In the RAM NRT case, which directory we write to can change depending on if a ram buffer has exceeded it's maximum available size. If it is under half the available ram it will to go the ram dir, if not the new segment will be written to disk. For this reason we can't simply pass a directory into the constructor of DocumentsWriter, nor can we rely on calling IW.getFlushDirectory. We should be able to rely on the directory in SegmentWriteState?
          Hide
          Michael McCandless added a comment -

          Does IW.optimize and IW.expungeDeletes operate on the ramdir as
          well (the expungeDeletes javadoc implies calling
          IR.numDeletedDocs will return zero when there are no deletes).

          I think IW.optimize should mean all RAM segments are merged into the single on-disk segment?

          And IW.expungeDeletes should also apply to RAM segments, ie if RAM segments have pending deletes, they are merged away (possibly entirely in RAM, ie the RAM merge policy could simply merge to a new RAM segment)?

          In the patch the merge policies are split up which requires some
          of the RAM NRT logic to be in updatePendingMerges.
          One solution is to have a merge policy that manages merging to
          ram and to disk,

          It looks like it's the "is it time to flush to disk" logic, right? Why can't we make that the purview of the RAM MergePolicy? We may need to extend MergePolicy API to tell it how much RAM is free in the budget.

          We should be able to rely on the directory in SegmentWriteState?

          I think we should fix the indexing chain to always use SegmentWriteState's Directory and not pass Directory to the ctors? Does something go wrong if we take that approach?

          Show
          Michael McCandless added a comment - Does IW.optimize and IW.expungeDeletes operate on the ramdir as well (the expungeDeletes javadoc implies calling IR.numDeletedDocs will return zero when there are no deletes). I think IW.optimize should mean all RAM segments are merged into the single on-disk segment? And IW.expungeDeletes should also apply to RAM segments, ie if RAM segments have pending deletes, they are merged away (possibly entirely in RAM, ie the RAM merge policy could simply merge to a new RAM segment)? In the patch the merge policies are split up which requires some of the RAM NRT logic to be in updatePendingMerges. One solution is to have a merge policy that manages merging to ram and to disk, It looks like it's the "is it time to flush to disk" logic, right? Why can't we make that the purview of the RAM MergePolicy? We may need to extend MergePolicy API to tell it how much RAM is free in the budget. We should be able to rely on the directory in SegmentWriteState? I think we should fix the indexing chain to always use SegmentWriteState's Directory and not pass Directory to the ctors? Does something go wrong if we take that approach?
          Hide
          Jason Rutherglen added a comment -

          IW.optimize should mean all RAM segments are merged into
          the single on-disk segment

          Yes.

          IW.expungeDeletes should also apply to RAM segments, ie
          if RAM segments have pending deletes, they are merged away
          (possibly entirely in RAM, ie the RAM merge policy could simply
          merge to a new RAM segment)?

          Yes.

          we should fix the indexing chain to always use
          SegmentWriteState's Directory and not pass Directory to the
          ctors

          Yep.

          The next patch will have these features.

          Show
          Jason Rutherglen added a comment - IW.optimize should mean all RAM segments are merged into the single on-disk segment Yes. IW.expungeDeletes should also apply to RAM segments, ie if RAM segments have pending deletes, they are merged away (possibly entirely in RAM, ie the RAM merge policy could simply merge to a new RAM segment)? Yes. we should fix the indexing chain to always use SegmentWriteState's Directory and not pass Directory to the ctors Yep. The next patch will have these features.
          Hide
          Jason Rutherglen added a comment -
          • A single merge scheduler is used. We will need to open a new
            issue for a version of ConcurrentMergeScheduler that allocates
            threads perhaps based on the merge.directory? We'd also probably
            want to add thread pooling.
          • There's a package protected IW ctor that accepts the ram dir.
            This is used in the test case for insuring we aren't creating
            .cfs files in the ram dir.
          • IW.optimize merges all segments (ram included) to the primary
            dir
          • IW.expungeDeletes merges segments with deletes, in ram ones
            stay in ram (unless they won't fit), and primary dir ones are
            handled as usual
          • Added testOptimize, testExpungeDeletes, and some other test
            cases
          • Needs a test case to make sure we're merging to the primary
            dir when the ram dir is full or a flush won't fit in the ram dir
          • There's a mergeRamSegmentsToDir and resolveRamSegments. Two
            different methods because mergeRamSegmentsToDir operates by
            simply scheduling merges, resolveRamSegments operates in the
            foreground like resolveExternalSegments. I'm not sure if we can
            combine the two. resolveRamSegments seems to have a thread
            notification problem and so hangs at times. I'll look into this
            further unless it's obvious what the problem is.
          • When RAM NRT is on (via the IndexWriter constructor), setting
            the ram buffer size allocates half of the given number to the
            DocumentsWriter buffer and half to the ram dir. It may be best
            to dynamically change these numbers based on usage etc.
          • Added NRTMergePolicy which is used only when RAM NRT is on. It
            utilizes the regular merge policy and the ram merge policy.
          • The ram dir size is pushed to DocumentsWriter
          • RAMMergePolicy extends LogDocMergePolicy and defaults the
            useCompoundFile and useCompoundDocStore to false
          • Sorry for the whitespace stuff, I'll clean it up later, I
            wanted to post the latest to get feedback
          Show
          Jason Rutherglen added a comment - A single merge scheduler is used. We will need to open a new issue for a version of ConcurrentMergeScheduler that allocates threads perhaps based on the merge.directory? We'd also probably want to add thread pooling. There's a package protected IW ctor that accepts the ram dir. This is used in the test case for insuring we aren't creating .cfs files in the ram dir. IW.optimize merges all segments (ram included) to the primary dir IW.expungeDeletes merges segments with deletes, in ram ones stay in ram (unless they won't fit), and primary dir ones are handled as usual Added testOptimize, testExpungeDeletes, and some other test cases Needs a test case to make sure we're merging to the primary dir when the ram dir is full or a flush won't fit in the ram dir There's a mergeRamSegmentsToDir and resolveRamSegments. Two different methods because mergeRamSegmentsToDir operates by simply scheduling merges, resolveRamSegments operates in the foreground like resolveExternalSegments. I'm not sure if we can combine the two. resolveRamSegments seems to have a thread notification problem and so hangs at times. I'll look into this further unless it's obvious what the problem is. When RAM NRT is on (via the IndexWriter constructor), setting the ram buffer size allocates half of the given number to the DocumentsWriter buffer and half to the ram dir. It may be best to dynamically change these numbers based on usage etc. Added NRTMergePolicy which is used only when RAM NRT is on. It utilizes the regular merge policy and the ram merge policy. The ram dir size is pushed to DocumentsWriter RAMMergePolicy extends LogDocMergePolicy and defaults the useCompoundFile and useCompoundDocStore to false Sorry for the whitespace stuff, I'll clean it up later, I wanted to post the latest to get feedback
          Hide
          Jason Rutherglen added a comment -

          I think the easiest way to handle the ram buf size vs. the ram
          dir size is the allow each to grow on request. I have some code
          I need to test that implements it. This way we're growing based
          on demand and availability. The only thing we may want to add is
          a way to grow and perhaps automatically flush based on the
          growth requested and perhaps prioritizing requests?

          Show
          Jason Rutherglen added a comment - I think the easiest way to handle the ram buf size vs. the ram dir size is the allow each to grow on request. I have some code I need to test that implements it. This way we're growing based on demand and availability. The only thing we may want to add is a way to grow and perhaps automatically flush based on the growth requested and perhaps prioritizing requests?
          Hide
          Jason Rutherglen added a comment -
          • All tests pass, added more tests
          • Added DocumentsWriter.growRamBufferBy/growRamDirMaxBy methods
            that allow dynamically requesting more ram. We start off at
            50/50, ramdir/rambuffer. Then whenever one needs more, grow* is
            called.
          • We need a RAMPolicy class that allows customizing how ram is
            allocated. Currently the ramdir and the rambuffer compete for
            space, the user will presumably want to customize this.
          • I'm not sure the flushing always occurs when it should, and
            not sure yet how to test to insure it's flushing when it should
            (other than watching a log). What happened to the adding logging
            to Lucene patch?
          Show
          Jason Rutherglen added a comment - All tests pass, added more tests Added DocumentsWriter.growRamBufferBy/growRamDirMaxBy methods that allow dynamically requesting more ram. We start off at 50/50, ramdir/rambuffer. Then whenever one needs more, grow* is called. We need a RAMPolicy class that allows customizing how ram is allocated. Currently the ramdir and the rambuffer compete for space, the user will presumably want to customize this. I'm not sure the flushing always occurs when it should, and not sure yet how to test to insure it's flushing when it should (other than watching a log). What happened to the adding logging to Lucene patch?
          Hide
          Michael McCandless added a comment -

          I think generally we are close. I have lots of little comments from
          looking through the patch:

          • Can you update the CHANGES entry to something like "IndexWriter
            now uses RAM more efficiently when in near real-time mode"? (Ie
            we don't pass RAMDir to IW).
          • DW.push/getRAMDirSize, RAMTotalMax, RAMBufferAvailable, etc. need
            to be synchronized?
          • Since IW.flushDocStores always goes to the main directory, why
            does it now take a Directory arg?
          • I don't think doAfterFlush should be responsible for calling
            pushRamDirSize(); that's more of a hook for external subclasses.
          • Yes, IW.ramSizeInBytes() should include the ramDir's bytes
          • There are still places where Directory.contains should be used,
            instead of pulling both dirs and checkign each. EG, the assert in
            DW.applyDeletes, and this assert in IW:
            if (ramNrt && merge.directory == switchDirectory) {
              assert !merge.useCompoundFile;
            }
            

            I'd like to eliminate IW.getInternalDirectory, if possible: to
            anyone interacting with IW, there is only one Directory, and the
            switching is entirely "under the hood".

          • I realized there is in fact a benefit to using CFS in RAM: much
            better RAM efficiency for tiny segments (because RAMDir's buffer
            size is 1 KB). Though such segments would presumably be merged
            away with time, so it may not be a big deal...
          • Is IW.mergeRAMSegmentToDir only for testing?
          • Can you name things theRAMSetting instead of theRamSetting? (Ie,
            RAM is all caps).
          • For IW.resolveRAMSegments, maybe we should make a single merge
            that merges everything down? Why even bother interacting with a
            merge policy, here?
          • Can you rename flush()'s new arg "flushToRAM" to
            "allowFlushToRAM"? Ie, even when this is true, that method may
            decide RAM is full and in fact flush to the real dir.
          • Can you rename IW.ramNRT to IW.flushToRAM? (Since it's in fact
            orthogonal to NRT).
          • It's sneaky to set docWriter.flushToDir before calling
            docWriter.flush; can't we make that an arg to docWriter.flush?
            (And docWriter would never store it).
          • Why did you need to add DW.fileLength?
          • IW.SWITCH_FILE_EXTS should be private static final (not public)?
          • We lost private on a number of attrs in IW – can you restore?
            (You should insert nocommit comments when you do that, to reduce
            risk that such changes slip in).
          • Likewise for SegmentReader.coreRef.
          • Why did you need to make RAMDir.sizeInBytes volatile? Isn't it
            always updated/accessed from sync(RAMDir) context?
          • Why do we need a new class RAMMergePolicy? (There's no API
            difference over MergePolicy). Can't we simply by default
            instantiate LogByteSizeMergePolicy, and set CFS/CFX to false?
          • IW.fileSwitchDirectory should be private?
          • Have you done any perf tests with flushToRAM = true? EG should we
            enable it by default? I think if we have a good policy for
            managing RAM it could very well be higher performance. But, we
            should explore this under a different issue, so leave the default
            at "no ram dir".

          On the "how to share RAM" between RAMDir & DW's RAM buffer... instead
          of pre-dividing and growing over time, I think we can simplify it by
          logically sharing a single "pool".

          The RAMDir only alters its ram usage when 1) we flush a new segment to
          it, 2) a merge completes (either writing to the real dir or to the ram
          dir), or 3) deletes are applied to segments in RAM. When such a
          change happens we notify DW. DW takes then adds that base into its
          ram consumption to decide when it's time to flush.

          For starters, and we can optimize this later, I don't think DW should
          choose on its own to flush itself to the RAMDir? That should only
          happen when getReader is called, and there's still plenty of RAM
          free.

          So what happens is... each time getReader() is called, we make a new
          smallish RAM segment. Over time, these RAM segments need merging so
          we merge them. (If such a merge is fairly large, probably instead of
          writing to ram it should write the new segment to the real dir, since
          intermediate RAM usage will be too high).

          At some point, DW detects that the RAMDir size plus its own buffer is
          at the limit. If DW's buffer is relatively small, it should probably
          simply flush to the RAMDir then dump entire RAMDir to the real dir as
          a single merge. If DW's buffer is big, as would happen if you opened
          an NRT reader but never actually called getReader(), it should flush
          straight to the real dir.

          One challenge we face is ensuring that while we are flushing all ram
          segments to disk, we don't block the getReader() turnaround. IE we
          can't make getReader() do that flush synchronously. So that needs to
          be a BG merge, but we must somehow temporarily disregard the size of
          those segments while the merge is running. Or, perhaps we "merge RAM
          segments to disk" a bit early, eg once RAM consumed is > 90% of the
          total RAM buffer, or something.

          Show
          Michael McCandless added a comment - I think generally we are close. I have lots of little comments from looking through the patch: Can you update the CHANGES entry to something like "IndexWriter now uses RAM more efficiently when in near real-time mode"? (Ie we don't pass RAMDir to IW). DW.push/getRAMDirSize, RAMTotalMax, RAMBufferAvailable, etc. need to be synchronized? Since IW.flushDocStores always goes to the main directory, why does it now take a Directory arg? I don't think doAfterFlush should be responsible for calling pushRamDirSize(); that's more of a hook for external subclasses. Yes, IW.ramSizeInBytes() should include the ramDir's bytes There are still places where Directory.contains should be used, instead of pulling both dirs and checkign each. EG, the assert in DW.applyDeletes, and this assert in IW: if (ramNrt && merge.directory == switchDirectory) { assert !merge.useCompoundFile; } I'd like to eliminate IW.getInternalDirectory, if possible: to anyone interacting with IW, there is only one Directory, and the switching is entirely "under the hood". I realized there is in fact a benefit to using CFS in RAM: much better RAM efficiency for tiny segments (because RAMDir's buffer size is 1 KB). Though such segments would presumably be merged away with time, so it may not be a big deal... Is IW.mergeRAMSegmentToDir only for testing? Can you name things theRAMSetting instead of theRamSetting? (Ie, RAM is all caps). For IW.resolveRAMSegments, maybe we should make a single merge that merges everything down? Why even bother interacting with a merge policy, here? Can you rename flush()'s new arg "flushToRAM" to "allowFlushToRAM"? Ie, even when this is true, that method may decide RAM is full and in fact flush to the real dir. Can you rename IW.ramNRT to IW.flushToRAM? (Since it's in fact orthogonal to NRT). It's sneaky to set docWriter.flushToDir before calling docWriter.flush; can't we make that an arg to docWriter.flush? (And docWriter would never store it). Why did you need to add DW.fileLength? IW.SWITCH_FILE_EXTS should be private static final (not public)? We lost private on a number of attrs in IW – can you restore? (You should insert nocommit comments when you do that, to reduce risk that such changes slip in). Likewise for SegmentReader.coreRef. Why did you need to make RAMDir.sizeInBytes volatile? Isn't it always updated/accessed from sync(RAMDir) context? Why do we need a new class RAMMergePolicy? (There's no API difference over MergePolicy). Can't we simply by default instantiate LogByteSizeMergePolicy, and set CFS/CFX to false? IW.fileSwitchDirectory should be private? Have you done any perf tests with flushToRAM = true? EG should we enable it by default? I think if we have a good policy for managing RAM it could very well be higher performance. But, we should explore this under a different issue, so leave the default at "no ram dir". On the "how to share RAM" between RAMDir & DW's RAM buffer... instead of pre-dividing and growing over time, I think we can simplify it by logically sharing a single "pool". The RAMDir only alters its ram usage when 1) we flush a new segment to it, 2) a merge completes (either writing to the real dir or to the ram dir), or 3) deletes are applied to segments in RAM. When such a change happens we notify DW. DW takes then adds that base into its ram consumption to decide when it's time to flush. For starters, and we can optimize this later, I don't think DW should choose on its own to flush itself to the RAMDir? That should only happen when getReader is called, and there's still plenty of RAM free. So what happens is... each time getReader() is called, we make a new smallish RAM segment. Over time, these RAM segments need merging so we merge them. (If such a merge is fairly large, probably instead of writing to ram it should write the new segment to the real dir, since intermediate RAM usage will be too high). At some point, DW detects that the RAMDir size plus its own buffer is at the limit. If DW's buffer is relatively small, it should probably simply flush to the RAMDir then dump entire RAMDir to the real dir as a single merge. If DW's buffer is big, as would happen if you opened an NRT reader but never actually called getReader(), it should flush straight to the real dir. One challenge we face is ensuring that while we are flushing all ram segments to disk, we don't block the getReader() turnaround. IE we can't make getReader() do that flush synchronously. So that needs to be a BG merge, but we must somehow temporarily disregard the size of those segments while the merge is running. Or, perhaps we "merge RAM segments to disk" a bit early, eg once RAM consumed is > 90% of the total RAM buffer, or something.
          Hide
          Jason Rutherglen added a comment -

          I started on the pooled ram model after the last patch because
          it is cleaner. Bytes are allocated up to the given limit set by
          IW.setRAMBufferSizeMB. As mentioned below, we may want to add a
          setting for the max ram temporarily used.

          I'm reusing the DocumentsWriter.numBytesAlloc/numBytesUsed and
          created a RAMPolicy that manages ramDirBytesAlloc and
          ramDirBytes. Each time a merge is scheduled, the
          sizeof(segments) is allocated by RAMPolicy and the segmentsAlloc
          is stored in OneMerge. Once the merge completes or fails, the
          ramDirBytesAlloc is adjusted by the difference between the
          actual bytes used and OM.ramDirAlloc. This way we always have
          the most accurate ramDir allocation in RamP, and we properly
          adjust the amount of ram consumed. This works well with our
          concurrent merging model where we can't predict when a merge
          will complete.

          One challenge we face is ensuring that while we are
          flushing all ram segments to disk, we don't block the
          getReader() turnaround. IE we can't make getReader() do that
          flush synchronously....perhaps we "merge RAM segments to disk" a
          bit early, eg once RAM consumed is > 90% of the total RAM
          buffer

          You're talking about the synchronization in IW.doFlushInternal
          which would block getReader while writing a segment to disk? Our
          default RAMPolicy should be one where we always flush the ram
          buffer to the ramdir. Basically there must always be room in the
          ram dir for the ram buffer. ramdir + (rambuf * 2) < maxSize. Or
          do we assume that it's ok for ramUsed to temporarily exceed
          ramMax by a given percent (110% which would be an option in
          RAMPolicy)? while ramBuf is being flushed to ramDir?

          We may want to make some assumptions about usage of getReader
          (i.e. getReader is called fairly often such that the rambuffer
          is usually less than half of the ram used) when flushToRam=true
          so that we can get a version of this functionality out the door,
          then iterate as we gather feedback from users?

          I'll include the comments in the next patch.

          Show
          Jason Rutherglen added a comment - I started on the pooled ram model after the last patch because it is cleaner. Bytes are allocated up to the given limit set by IW.setRAMBufferSizeMB. As mentioned below, we may want to add a setting for the max ram temporarily used. I'm reusing the DocumentsWriter.numBytesAlloc/numBytesUsed and created a RAMPolicy that manages ramDirBytesAlloc and ramDirBytes. Each time a merge is scheduled, the sizeof(segments) is allocated by RAMPolicy and the segmentsAlloc is stored in OneMerge. Once the merge completes or fails, the ramDirBytesAlloc is adjusted by the difference between the actual bytes used and OM.ramDirAlloc. This way we always have the most accurate ramDir allocation in RamP, and we properly adjust the amount of ram consumed. This works well with our concurrent merging model where we can't predict when a merge will complete. One challenge we face is ensuring that while we are flushing all ram segments to disk, we don't block the getReader() turnaround. IE we can't make getReader() do that flush synchronously....perhaps we "merge RAM segments to disk" a bit early, eg once RAM consumed is > 90% of the total RAM buffer You're talking about the synchronization in IW.doFlushInternal which would block getReader while writing a segment to disk? Our default RAMPolicy should be one where we always flush the ram buffer to the ramdir. Basically there must always be room in the ram dir for the ram buffer. ramdir + (rambuf * 2) < maxSize. Or do we assume that it's ok for ramUsed to temporarily exceed ramMax by a given percent (110% which would be an option in RAMPolicy)? while ramBuf is being flushed to ramDir? We may want to make some assumptions about usage of getReader (i.e. getReader is called fairly often such that the rambuffer is usually less than half of the ram used) when flushToRam=true so that we can get a version of this functionality out the door, then iterate as we gather feedback from users? I'll include the comments in the next patch.
          Hide
          Jason Rutherglen added a comment -

          I had forgotten about concurrency with the docstores (keeping an
          open IndexInput and IndexOutput) when using an FSDirectory. I
          wrote a test (which fails) so getting this functionality to work
          could require some reworking of FSDirectory internals?
          (Something on the order of auto updating IndexInput's buffers
          and file length as IndexOutput is flushed?)

          We need simultaneous IndexInput and IndexOutput ops to work as
          docstore is streamed to FSDir for multiple segments. After a
          segment is flushed to the ramdir it can be read from, including
          the docstore which is still actively being written to (for the
          new segments)?

          Show
          Jason Rutherglen added a comment - I had forgotten about concurrency with the docstores (keeping an open IndexInput and IndexOutput) when using an FSDirectory. I wrote a test (which fails) so getting this functionality to work could require some reworking of FSDirectory internals? (Something on the order of auto updating IndexInput's buffers and file length as IndexOutput is flushed?) We need simultaneous IndexInput and IndexOutput ops to work as docstore is streamed to FSDir for multiple segments. After a segment is flushed to the ramdir it can be read from, including the docstore which is still actively being written to (for the new segments)?
          Hide
          Michael McCandless added a comment -

          I had forgotten about concurrency with the docstores

          That's a big (and good) change; I think we should save that one for another issue, and leave this one focusing on flushing segments through a RAMDir?

          Show
          Michael McCandless added a comment - I had forgotten about concurrency with the docstores That's a big (and good) change; I think we should save that one for another issue, and leave this one focusing on flushing segments through a RAMDir?
          Hide
          Jason Rutherglen added a comment -

          I guess I'm confused about the doc stores. We keep one open on
          disk for multiple segments being created in the via FSD, which
          means the IndexOutput isn't closed, however for each new
          SegmentReader that's opened, we're creating a new IndexInput
          only after the segment is flushed (to FSD, docstore to disk).

          So the concurrent docstores may work without too much changing
          of FSDir internals, as the portion of the docstore file that the
          SR needs to know about has been flushed to disk when SR is
          opened. The SR should then be able to open the docstore file
          cleanly regardless of the open IndexOutput adding more to the
          file for the next rambuf segment?

          Show
          Jason Rutherglen added a comment - I guess I'm confused about the doc stores. We keep one open on disk for multiple segments being created in the via FSD, which means the IndexOutput isn't closed, however for each new SegmentReader that's opened, we're creating a new IndexInput only after the segment is flushed (to FSD, docstore to disk). So the concurrent docstores may work without too much changing of FSDir internals, as the portion of the docstore file that the SR needs to know about has been flushed to disk when SR is opened. The SR should then be able to open the docstore file cleanly regardless of the open IndexOutput adding more to the file for the next rambuf segment?
          Hide
          Michael McCandless added a comment -

          The SR should then be able to open the docstore file
          cleanly regardless of the open IndexOutput adding more to the
          file for the next rambuf segment?

          That's the crucial question: can we open a new IndexInput, while an IndexOutput is still writing to the file? I vaguely remember being surprised that this worked fine on Windows, but I'm not sure.

          If that's fine across all OS's, then, yes we could avoid closing the docStores when flushing a new segment.

          If it's not fine, then we'd need a way to make an IndexOutputInput, which is a bigger change.

          We also should [separately] consider having multiple SegmentReaders that share the same docStores, share a single set of IndexInputs (cloned). Ie if the RAM MergePolicy allows many segments in RAM at once, we are still opening real file descriptors to read the doc stores, so without such sharing we could start running out of descriptors.

          Show
          Michael McCandless added a comment - The SR should then be able to open the docstore file cleanly regardless of the open IndexOutput adding more to the file for the next rambuf segment? That's the crucial question: can we open a new IndexInput, while an IndexOutput is still writing to the file? I vaguely remember being surprised that this worked fine on Windows, but I'm not sure. If that's fine across all OS's, then, yes we could avoid closing the docStores when flushing a new segment. If it's not fine, then we'd need a way to make an IndexOutputInput, which is a bigger change. We also should [separately] consider having multiple SegmentReaders that share the same docStores, share a single set of IndexInputs (cloned). Ie if the RAM MergePolicy allows many segments in RAM at once, we are still opening real file descriptors to read the doc stores, so without such sharing we could start running out of descriptors.
          Hide
          Jason Rutherglen added a comment -

          I want to run the Lucene unit tests in NRT mode without creating and/or
          modifying all the test cases. In lieu of adding a
          System.property that turns NRT on, have we settled on a
          different mechanism for global settings? Perhaps the back compat
          type of system can be used here? Or for now a static variable on
          IW?

          can we open a new IndexInput, while an IndexOutput is
          still writing to the file?

          I ran a test case successfully that writes to a file while
          opening threads that read from flushed sections on windows.

          Closing docstores for every flush would seem to cause a lot of
          overhead. With NRT + FSD aren't termvector files merged on disk for
          every segment?

          We also should [separately] consider having multiple
          SegmentReaders that share the same docStores

          Doesn't FSDir open only one FD per file?

          Show
          Jason Rutherglen added a comment - I want to run the Lucene unit tests in NRT mode without creating and/or modifying all the test cases. In lieu of adding a System.property that turns NRT on, have we settled on a different mechanism for global settings? Perhaps the back compat type of system can be used here? Or for now a static variable on IW? can we open a new IndexInput, while an IndexOutput is still writing to the file? I ran a test case successfully that writes to a file while opening threads that read from flushed sections on windows. Closing docstores for every flush would seem to cause a lot of overhead. With NRT + FSD aren't termvector files merged on disk for every segment? We also should [separately] consider having multiple SegmentReaders that share the same docStores Doesn't FSDir open only one FD per file?
          Hide
          Michael McCandless added a comment -

          I want to run the Lucene unit tests in NRT mode without creating and/or
          modifying all the test cases.

          You can just temporarily set the default then run all tests?

          We never reached consensus on a back compat sort of setting...

          I ran a test case successfully that writes to a file while
          opening threads that read from flushed sections on windows.

          Excellent; let's take this up under a new issue? If this holds true across platform (and Windows is the most challenging one) then it'll make sharing much easier.

          Closing docstores for every flush would seem to cause a lot of
          overhead. With NRT + FSD aren't termvector files merged on disk for
          every segment?

          Yes it adds overhead, though, it's not on the critical path for opening a new reader since it happens in the BG. So... things like LUCENE-1526 (making deletions, norms incrementally copy-on-write) I think are higher priority.

          Doesn't FSDir open only one FD per file?

          No, it opens a real file every time openInput is called. I guess we could think about having it share/clone internally?

          Show
          Michael McCandless added a comment - I want to run the Lucene unit tests in NRT mode without creating and/or modifying all the test cases. You can just temporarily set the default then run all tests? We never reached consensus on a back compat sort of setting... I ran a test case successfully that writes to a file while opening threads that read from flushed sections on windows. Excellent; let's take this up under a new issue? If this holds true across platform (and Windows is the most challenging one) then it'll make sharing much easier. Closing docstores for every flush would seem to cause a lot of overhead. With NRT + FSD aren't termvector files merged on disk for every segment? Yes it adds overhead, though, it's not on the critical path for opening a new reader since it happens in the BG. So... things like LUCENE-1526 (making deletions, norms incrementally copy-on-write) I think are higher priority. Doesn't FSDir open only one FD per file? No, it opens a real file every time openInput is called. I guess we could think about having it share/clone internally?
          Hide
          Jason Rutherglen added a comment -

          You can just temporarily set the default then run all
          tests?

          Or have separate parallel classes inherited classes that set the
          static IW variable, run the tests, and perhaps have additional
          NRT specific test methods.

          I guess we could think about having it share/clone
          internally?

          Yeah we should, I got this confused with how we're cloning in
          most o.a.l.i classes, but if we're calling openInput it seems
          like we should have it internally clone, I guess we'll have an
          issue with classes that don't IndexInput.close, however it's
          better to solve these than open many unnecessary file
          descriptors.

          let's take this up under a new issue?

          The issue would only be the unit test for now, or should it be a
          part of an existing issue? Ok, it will clean up LUCENE-1313's
          unit test class.

          Show
          Jason Rutherglen added a comment - You can just temporarily set the default then run all tests? Or have separate parallel classes inherited classes that set the static IW variable, run the tests, and perhaps have additional NRT specific test methods. I guess we could think about having it share/clone internally? Yeah we should, I got this confused with how we're cloning in most o.a.l.i classes, but if we're calling openInput it seems like we should have it internally clone, I guess we'll have an issue with classes that don't IndexInput.close, however it's better to solve these than open many unnecessary file descriptors. let's take this up under a new issue? The issue would only be the unit test for now, or should it be a part of an existing issue? Ok, it will clean up LUCENE-1313 's unit test class.
          Hide
          Jason Rutherglen added a comment -

          With flushToRAM=true, when IW.commit is called, do we still want
          to not have concurrent merges execute synchronously? Or only
          wait for the merges to complete that are from ramDir to
          primaryDir?

          Show
          Jason Rutherglen added a comment - With flushToRAM=true, when IW.commit is called, do we still want to not have concurrent merges execute synchronously? Or only wait for the merges to complete that are from ramDir to primaryDir?
          Hide
          Michael McCandless added a comment -

          The issue would only be the unit test for now, or should it be a
          part of an existing issue?

          I meant the issue of FSDir sharing a single IndexInput if the same file is opened more than once (you opened LUCENE-1671 for this).

          I guess we'll have an issue with classes that don't IndexInput.close

          This (failing to close an IndexInput that was obtained from Directory.openInput) should be rare in Lucene – we try not to do that. Failing to close clones does happen...

          With flushToRAM=true, when IW.commit is called, do we still want
          to not have concurrent merges execute synchronously? Or only
          wait for the merges to complete that are from ramDir to
          primaryDir?

          I think only ramDir -> primaryDir? commit() today doens't block on BG merges.

          Show
          Michael McCandless added a comment - The issue would only be the unit test for now, or should it be a part of an existing issue? I meant the issue of FSDir sharing a single IndexInput if the same file is opened more than once (you opened LUCENE-1671 for this). I guess we'll have an issue with classes that don't IndexInput.close This (failing to close an IndexInput that was obtained from Directory.openInput) should be rare in Lucene – we try not to do that. Failing to close clones does happen... With flushToRAM=true, when IW.commit is called, do we still want to not have concurrent merges execute synchronously? Or only wait for the merges to complete that are from ramDir to primaryDir? I think only ramDir -> primaryDir? commit() today doens't block on BG merges.
          Hide
          Jason Rutherglen added a comment -

          For IW.resolveRAMSegments, maybe we should make a single
          merge that merges everything down? Why even bother interacting
          with a merge policy, here?

          LogMergePolicy.findMergesForOptimize I assumed would merge
          segments to a single segment, in testing resolveRAMSegments that
          doesn't always seem to be the case?

          When I created a mergeAllSegments to one, the
          ensureContiguousMerge would throw an exception. I thought about
          avoiding ensureContiguousMerge but it required reworking a bunch
          of code. Why does ensureContiguousMerge exist? Is it for
          assertions or is there a reason segments need to be next to each
          other when merging?

          Show
          Jason Rutherglen added a comment - For IW.resolveRAMSegments, maybe we should make a single merge that merges everything down? Why even bother interacting with a merge policy, here? LogMergePolicy.findMergesForOptimize I assumed would merge segments to a single segment, in testing resolveRAMSegments that doesn't always seem to be the case? When I created a mergeAllSegments to one, the ensureContiguousMerge would throw an exception. I thought about avoiding ensureContiguousMerge but it required reworking a bunch of code. Why does ensureContiguousMerge exist? Is it for assertions or is there a reason segments need to be next to each other when merging?
          Hide
          Michael McCandless added a comment -

          LogMergePolicy.findMergesForOptimize I assumed would merge
          segments to a single segment, in testing resolveRAMSegments that
          doesn't always seem to be the case?

          It should, but it will respect mergeFactor in the process (ie do multiple merges if necessary). I'm thinking we should just merge all the RAM segments in one go, and not consult merge policy, in resolveRAMSegments.

          When I created a mergeAllSegments to one, the
          ensureContiguousMerge would throw an exception.

          Why does this throw an exception? Ie you passed in the in-order RAM segments?

          Why does ensureContiguousMerge exist? Is it for
          assertions or is there a reason segments need to be next to each
          other when merging?

          I think the only thing that'd break is IndexWriter now assumes continuity when it removes the old segments and puts the new one in. LUCENE-1076 explores allowing MergePolicy to select non-contiguous merges.

          But: with autoCommit=false, in order to avoid merging the doc stores, the segments (even RAM segments) must be contiguous. This is a sizable performance gain when building a large index in one IndexWriter session.

          Show
          Michael McCandless added a comment - LogMergePolicy.findMergesForOptimize I assumed would merge segments to a single segment, in testing resolveRAMSegments that doesn't always seem to be the case? It should, but it will respect mergeFactor in the process (ie do multiple merges if necessary). I'm thinking we should just merge all the RAM segments in one go, and not consult merge policy, in resolveRAMSegments. When I created a mergeAllSegments to one, the ensureContiguousMerge would throw an exception. Why does this throw an exception? Ie you passed in the in-order RAM segments? Why does ensureContiguousMerge exist? Is it for assertions or is there a reason segments need to be next to each other when merging? I think the only thing that'd break is IndexWriter now assumes continuity when it removes the old segments and puts the new one in. LUCENE-1076 explores allowing MergePolicy to select non-contiguous merges. But: with autoCommit=false, in order to avoid merging the doc stores, the segments (even RAM segments) must be contiguous. This is a sizable performance gain when building a large index in one IndexWriter session.
          Hide
          Jason Rutherglen added a comment -

          Just so we don't forget, we need to test flushToRAM with a custom IndexDeletionPolicy which could be a bit tricky.

          Show
          Jason Rutherglen added a comment - Just so we don't forget, we need to test flushToRAM with a custom IndexDeletionPolicy which could be a bit tricky.
          Hide
          Jason Rutherglen added a comment -
          • RAM buffer size is stored in the writer rather than set into
            DocumentsWriter. This is due to the actual ram buffer limit in
            NRT changing depending on the size of the ramdir.
          • NRTMergePolicy and IW.resolveRAMSegments merges all ram dir
            segments to primaryDir (i.e. disk) when the ramDir is over
            totalMax, or any new merges would put ramDir over totalMax.
          • In DocumentsWriter we have a set limit on the buffer size
            which is (tempMax - ramDirSize)/2. This keeps the total ram used
            under the totalMax (or IW.maxBufferSize), while also keeping our
            temporary ram usage under the tempMax amount. When DW.ramBuffer
            limit is reached, it's auto flushed to the ramDir.
          • All tests pass except TestIndexWriterRAMDir.testFSDirectory.
            Will look into this further. When flushToRAM is on by default,
            there seems to be deadlock in
            org.apache.lucene.TestMergeSchedulerExternal, however when I
            tried to see if there is any via jconsole by setting
            ANT_OPTS="-Dcom.sun.management.jmxremote" I didn't see any. I'm
            not sure if this is due to not connecting to the right process?
            Or something else.
          • Added testReadDocuments which insures we can read documents
            we've flushed to disk. This essentially tests our ability to
            simultaneously read and write documents to and from the
            docstore. It seemd to work on Windows.
          • I think there's more that can be done to more accurately
            manage the RAM however I think the way it works is a good
            starting point.
          Show
          Jason Rutherglen added a comment - RAM buffer size is stored in the writer rather than set into DocumentsWriter. This is due to the actual ram buffer limit in NRT changing depending on the size of the ramdir. NRTMergePolicy and IW.resolveRAMSegments merges all ram dir segments to primaryDir (i.e. disk) when the ramDir is over totalMax, or any new merges would put ramDir over totalMax. In DocumentsWriter we have a set limit on the buffer size which is (tempMax - ramDirSize)/2. This keeps the total ram used under the totalMax (or IW.maxBufferSize), while also keeping our temporary ram usage under the tempMax amount. When DW.ramBuffer limit is reached, it's auto flushed to the ramDir. All tests pass except TestIndexWriterRAMDir.testFSDirectory. Will look into this further. When flushToRAM is on by default, there seems to be deadlock in org.apache.lucene.TestMergeSchedulerExternal, however when I tried to see if there is any via jconsole by setting ANT_OPTS="-Dcom.sun.management.jmxremote" I didn't see any. I'm not sure if this is due to not connecting to the right process? Or something else. Added testReadDocuments which insures we can read documents we've flushed to disk. This essentially tests our ability to simultaneously read and write documents to and from the docstore. It seemd to work on Windows. I think there's more that can be done to more accurately manage the RAM however I think the way it works is a good starting point.
          Hide
          Mark Miller added a comment -

          Whats the verdict on this one Mike? Got the impression this was a likely 3.1 ...

          Show
          Mark Miller added a comment - Whats the verdict on this one Mike? Got the impression this was a likely 3.1 ...
          Hide
          Michael McCandless added a comment -

          OK let's push it to 3.1. It's very much in progress, but 1) the iterations are slow (it's a big patch), 2) it's a biggish change so I'd prefer to it shortly after a release, not shortly before, so it has plenty of time to "bake" on trunk.

          Show
          Michael McCandless added a comment - OK let's push it to 3.1. It's very much in progress, but 1) the iterations are slow (it's a big patch), 2) it's a biggish change so I'd prefer to it shortly after a release, not shortly before, so it has plenty of time to "bake" on trunk.
          Hide
          Jason Rutherglen added a comment -

          Just wanted to give an update on this, I'm running the unit
          tests with flushToRAM=true, the ones that fail are (mostly)
          tests that look for files when they're now in RAM (temporarily)
          and the like. I'm not sure what to do with these tests, 1)
          ignore them (kind of hard to not run specific methods, I think)
          2) or conditionalize them to run only if flushToRAM=false.

          Show
          Jason Rutherglen added a comment - Just wanted to give an update on this, I'm running the unit tests with flushToRAM=true, the ones that fail are (mostly) tests that look for files when they're now in RAM (temporarily) and the like. I'm not sure what to do with these tests, 1) ignore them (kind of hard to not run specific methods, I think) 2) or conditionalize them to run only if flushToRAM=false.
          Hide
          Jason Rutherglen added a comment -

          TestThreadedOptimize is throwing a ensureContiguousMerge
          exception. I think this is highlighting the change to merging
          all ram segments to a single primaryDir segment can sometimes
          lead to choosing segments that are non-contiguous? I'm not sure
          of the best way to handle this.

          Show
          Jason Rutherglen added a comment - TestThreadedOptimize is throwing a ensureContiguousMerge exception. I think this is highlighting the change to merging all ram segments to a single primaryDir segment can sometimes lead to choosing segments that are non-contiguous? I'm not sure of the best way to handle this.
          Hide
          Michael McCandless added a comment -

          I think this is highlighting the change to merging
          all ram segments to a single primaryDir segment can sometimes
          lead to choosing segments that are non-contiguous?

          I don't see why that results in a non-contiguous merge? Ie, at all times the RAM segments should be on the tail of SegmentInfos? So if you merge all of them, in order, that merge should be contiguous?

          Show
          Michael McCandless added a comment - I think this is highlighting the change to merging all ram segments to a single primaryDir segment can sometimes lead to choosing segments that are non-contiguous? I don't see why that results in a non-contiguous merge? Ie, at all times the RAM segments should be on the tail of SegmentInfos? So if you merge all of them, in order, that merge should be contiguous?
          Hide
          Michael McCandless added a comment -

          conditionalize them to run only if flushToRAM=false.

          That seems good?

          Show
          Michael McCandless added a comment - conditionalize them to run only if flushToRAM=false. That seems good?
          Hide
          Jason Rutherglen added a comment -

          The patch is cleaned up. A static variable IndexWriter.GLOBALNRT
          is added, which allows all the tests to be run with
          flushToRAM=true. I reran the tests which hopefully still work as
          intended. Tests that looked for specific file names were changed
          to work with NRT. Some of the tests are skipped entirely and
          need to be written specifically for flushToRAM.

          • TestIndexWriterMergePolicy,TestBackwardsCompatibility failures
            are expected
          • TestIndexWriterRAMDir.testFSDirectory fails (will be fixed)
          • TestThreadedOptimize ensureContiguousMerge fails. This one is
            a bit mysterious, perhaps the correct assertion will show where
            it's going wrong.

          I need to go through and mark the tests that can be converted to
          be NRT specific.

          Show
          Jason Rutherglen added a comment - The patch is cleaned up. A static variable IndexWriter.GLOBALNRT is added, which allows all the tests to be run with flushToRAM=true. I reran the tests which hopefully still work as intended. Tests that looked for specific file names were changed to work with NRT. Some of the tests are skipped entirely and need to be written specifically for flushToRAM. TestIndexWriterMergePolicy,TestBackwardsCompatibility failures are expected TestIndexWriterRAMDir.testFSDirectory fails (will be fixed) TestThreadedOptimize ensureContiguousMerge fails. This one is a bit mysterious, perhaps the correct assertion will show where it's going wrong. I need to go through and mark the tests that can be converted to be NRT specific.
          Hide
          Jason Rutherglen added a comment -
          • TestThreadedOptimize passes, LogMergePolicy now filters the
            segmentInfos based on the dir, rather than NRTMergePolicy
            passing in only ramInfos or primaryInfos. LogMergePolicy is
            careful to select contiguous segments, by passing in a subset of
            segmentInfos, the merge policy selection broke down.
          • TestIndexWriter.testAddIndexOnDiskFull,
            testAddIndexesWithCloseNoWait fails, which I don't think
            happened before. testAddIndexOnDiskFull fails when
            autoCommit=true which I'm not sure is a valid test by the time
            this patch goes in but it probably needs to be looked into.

          The other previous notes are still valid.

          Show
          Jason Rutherglen added a comment - TestThreadedOptimize passes, LogMergePolicy now filters the segmentInfos based on the dir, rather than NRTMergePolicy passing in only ramInfos or primaryInfos. LogMergePolicy is careful to select contiguous segments, by passing in a subset of segmentInfos, the merge policy selection broke down. TestIndexWriter.testAddIndexOnDiskFull, testAddIndexesWithCloseNoWait fails, which I don't think happened before. testAddIndexOnDiskFull fails when autoCommit=true which I'm not sure is a valid test by the time this patch goes in but it probably needs to be looked into. The other previous notes are still valid.
          Hide
          Jason Rutherglen added a comment -

          Using a single segmentInfos in IW seems to be problematic as
          we'll always potentially have different dir non-contiguous
          infos. I'm seeing the error off and on in different test cases.
          I will put together a patch separating the two dir infos in IW.

          Show
          Jason Rutherglen added a comment - Using a single segmentInfos in IW seems to be problematic as we'll always potentially have different dir non-contiguous infos. I'm seeing the error off and on in different test cases. I will put together a patch separating the two dir infos in IW.
          Hide
          Jason Rutherglen added a comment -

          On second thought, the previous idea would require quite a bit
          of work. Perhaps OneMerge can have the segmentInfos (ramDir or
          primaryDir) they were selected from and the ensureContiguous can
          verify that? Then we'd adjust commitMerge to remove the newly
          merged segments individually.

          I'll give this a try.

          Show
          Jason Rutherglen added a comment - On second thought, the previous idea would require quite a bit of work. Perhaps OneMerge can have the segmentInfos (ramDir or primaryDir) they were selected from and the ensureContiguous can verify that? Then we'd adjust commitMerge to remove the newly merged segments individually. I'll give this a try.
          Hide
          Jason Rutherglen added a comment -

          It's progressing. Randomly some tests fail such as the one noted below.

          • TestIndexWriter.testAddIndexesWithCloseNoWait fails with
            "rollback() was called or addIndexes* hit an unhandled
            exception", TestCrash.testWriterAfterCrash fails with " [junit]
            java.io.FileNotFoundException: _a.fnm [junit] at
            org.apache.lucene.store.MockRAMDirectory.openInput(MockRAMDirecto
            ry.java:252) [junit] at
            org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:67)"
          • assert in the ctor of MergeDocIDRemapper removed (not yet sure
            how to replace it)
          • OneMerge.fromInfos is added which is the set of segmentinfos
            the merge was selected from. This is for ensureContiguousMerge
            where it's failing because we have essentially two different
            sets of segmentInfos (ram and primaryDir) in the
            IW.segmentInfos. They are not related, but for convenience are
            kept together for most of IW, then are separated out in the
            merge policy. If the goal of ensureContiguousMerge is to keep
            docStoreSegments together, this will work as ramDir and
            primaryDir docStores should not need to be adjacent (I think,
            and need to verify).
          Show
          Jason Rutherglen added a comment - It's progressing. Randomly some tests fail such as the one noted below. TestIndexWriter.testAddIndexesWithCloseNoWait fails with "rollback() was called or addIndexes* hit an unhandled exception", TestCrash.testWriterAfterCrash fails with " [junit] java.io.FileNotFoundException: _a.fnm [junit] at org.apache.lucene.store.MockRAMDirectory.openInput(MockRAMDirecto ry.java:252) [junit] at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:67)" assert in the ctor of MergeDocIDRemapper removed (not yet sure how to replace it) OneMerge.fromInfos is added which is the set of segmentinfos the merge was selected from. This is for ensureContiguousMerge where it's failing because we have essentially two different sets of segmentInfos (ram and primaryDir) in the IW.segmentInfos. They are not related, but for convenience are kept together for most of IW, then are separated out in the merge policy. If the goal of ensureContiguousMerge is to keep docStoreSegments together, this will work as ramDir and primaryDir docStores should not need to be adjacent (I think, and need to verify).
          Hide
          Jason Rutherglen added a comment -

          Expected tests fail, the previous tests that were failing pass
          on 2 runs. I didn't make any changes though so am suspicious!

          Show
          Jason Rutherglen added a comment - Expected tests fail, the previous tests that were failing pass on 2 runs. I didn't make any changes though so am suspicious!
          Hide
          Jason Rutherglen added a comment -

          Benchmark of LUCENE-1313

          Show
          Jason Rutherglen added a comment - Benchmark of LUCENE-1313
          Hide
          Jason Rutherglen added a comment -

          Patch is updated to trunk and compiles. Unit tests fail because
          they're hardcoded for specific numbers of files etc, otherwise I
          don't think there's anything glaring.

          The included benchmark indexes 1 document, queries with a sort,
          and repeats N times. This is way too taxing using the existing
          IW.getReader when pointing to a hard drive, basically the
          machine becomes unusable. I haven't figured out what to publish
          or how long I want to wait. I'll just run it overnight.

          This patch's benchmark fairs much better, I'll publish it's
          results as well. Thinking 1, 10, 25, 50, 100, 1000 doc
          increments should suffice, so there's some variation.

          Show
          Jason Rutherglen added a comment - Patch is updated to trunk and compiles. Unit tests fail because they're hardcoded for specific numbers of files etc, otherwise I don't think there's anything glaring. The included benchmark indexes 1 document, queries with a sort, and repeats N times. This is way too taxing using the existing IW.getReader when pointing to a hard drive, basically the machine becomes unusable. I haven't figured out what to publish or how long I want to wait. I'll just run it overnight. This patch's benchmark fairs much better, I'll publish it's results as well. Thinking 1, 10, 25, 50, 100, 1000 doc increments should suffice, so there's some variation.
          Hide
          Jason Rutherglen added a comment -

          Ah, missing file now included.

          Show
          Jason Rutherglen added a comment - Ah, missing file now included.
          Hide
          Jason Rutherglen added a comment -

          I think this patch has a memory leak, I'm trying to get a copy of an open source Yourkit license for profiling the heap usage. The code is unfortunately quite large so whatever it is, is probably easy to fix and hard to find.

          Show
          Jason Rutherglen added a comment - I think this patch has a memory leak, I'm trying to get a copy of an open source Yourkit license for profiling the heap usage. The code is unfortunately quite large so whatever it is, is probably easy to fix and hard to find.
          Hide
          Jason Rutherglen added a comment -

          Realizing the previous patches approach has grown too
          complicated, this is a far simpler implementation that fulfills
          the same goal, batching segments in RAM until they exceed a
          given maximum size, then merging those RAM segments to a primary
          directory (i.e. disk). All the while allowing all segments to be
          searchable with a minimum latency turnaround.

          • Segment names are generated for the ram writer from the
            primary writer, this insures name continuity. Actually I'm not
            sure this is necessary anymore.
          • The problem is when the ram segments are merged into the
            primary writer, they appear to be non-contiguous. Some of the
            contiguous segment checking has been relaxed for this case, and
            needs to be conditional on the segment merging being from the
            ram dir. Perhaps we can have our cake and eat it too here by
            keeping the contiguous check around for all cases?
          • When the ram writer's usage exceeds a specified size, the ram
            buffer is flushed, and the ram segments are synchronously merged
            to the primary writer using a mechanism similar to
            addIndexesNoOptimize.
          Show
          Jason Rutherglen added a comment - Realizing the previous patches approach has grown too complicated, this is a far simpler implementation that fulfills the same goal, batching segments in RAM until they exceed a given maximum size, then merging those RAM segments to a primary directory (i.e. disk). All the while allowing all segments to be searchable with a minimum latency turnaround. Segment names are generated for the ram writer from the primary writer, this insures name continuity. Actually I'm not sure this is necessary anymore. The problem is when the ram segments are merged into the primary writer, they appear to be non-contiguous. Some of the contiguous segment checking has been relaxed for this case, and needs to be conditional on the segment merging being from the ram dir. Perhaps we can have our cake and eat it too here by keeping the contiguous check around for all cases? When the ram writer's usage exceeds a specified size, the ram buffer is flushed, and the ram segments are synchronously merged to the primary writer using a mechanism similar to addIndexesNoOptimize.
          Hide
          Jason Rutherglen added a comment -
          • Ensure contiguous is mostly back
          • Cleaned up the code and made the flush method non-synchronized
          • There's a subtle synchronization bug causing files to not be found in the testRandomThreads method
          • There's excessive merge logging to debug the sync issue
          Show
          Jason Rutherglen added a comment - Ensure contiguous is mostly back Cleaned up the code and made the flush method non-synchronized There's a subtle synchronization bug causing files to not be found in the testRandomThreads method There's excessive merge logging to debug the sync issue
          Hide
          Jason Rutherglen added a comment -

          I wanted to simplify a little more to more easily understand the
          edge cases that fail. In the multithreaded test, files were
          sometimes left open which is hard for me to debug.

          The TestNRT.testSimple method passes however, IndexFileDeleter
          is complaining about not being able to delete when expected
          which is shown in the IW.infoStream.

          The NRT.flush method creates a merge of all the ram segments,
          then calls IW.mergeIn to manually merge the ram segments into
          the writer. OneMerge contains the writer where the segment
          readers should be obtained from. In this case, the primary
          writer obtains the readers from the ram writer's readerpool.
          This is important because deletes may be coming in as we're
          merging. However I'm not sure this will work without a shared
          lock between the writers for commitMergedDeletes which requires
          syncing.

          Mike, can you take a look to see if this path will work?

          Show
          Jason Rutherglen added a comment - I wanted to simplify a little more to more easily understand the edge cases that fail. In the multithreaded test, files were sometimes left open which is hard for me to debug. The TestNRT.testSimple method passes however, IndexFileDeleter is complaining about not being able to delete when expected which is shown in the IW.infoStream. The NRT.flush method creates a merge of all the ram segments, then calls IW.mergeIn to manually merge the ram segments into the writer. OneMerge contains the writer where the segment readers should be obtained from. In this case, the primary writer obtains the readers from the ram writer's readerpool. This is important because deletes may be coming in as we're merging. However I'm not sure this will work without a shared lock between the writers for commitMergedDeletes which requires syncing. Mike, can you take a look to see if this path will work?
          Hide
          Jason Rutherglen added a comment -

          The tests pass, and the previous kinks seem to be worked out.
          Actually there is still one issue, where in waitForMerges, the
          assert mergingSegments size equals zero occasionally fails. I
          think this is a small sync problem because of the manual merge
          between the two writers.

          I'll run the multi threaded test at a longer interval to see
          what other errors may crop up. Once it runs successfully for
          lets say 30 minutes, we can beef up the stress testing of this
          patch by doing concurrent updates, deletes, etc.

          Show
          Jason Rutherglen added a comment - The tests pass, and the previous kinks seem to be worked out. Actually there is still one issue, where in waitForMerges, the assert mergingSegments size equals zero occasionally fails. I think this is a small sync problem because of the manual merge between the two writers. I'll run the multi threaded test at a longer interval to see what other errors may crop up. Once it runs successfully for lets say 30 minutes, we can beef up the stress testing of this patch by doing concurrent updates, deletes, etc.
          Hide
          Jason Rutherglen added a comment -

          Alright, the issue was simple, OneMerge.registerDone was being set to false by the primary writer, so the ram writer wasn't removing the infos from mergingSegments in mergeFinish.

          Show
          Jason Rutherglen added a comment - Alright, the issue was simple, OneMerge.registerDone was being set to false by the primary writer, so the ram writer wasn't removing the infos from mergingSegments in mergeFinish.
          Hide
          Jason Rutherglen added a comment -

          TestNRTReaderWithThreads2 fails periodically, it's just another
          synchronization issue. I added syncing on the merge writer in
          methods like commitMergedDeletes and commitMerge. Perhaps more
          of that type of syncing needs to be added. It can take time for
          these issues to be figured out.

          There's also remnants of a first attempt at transparently
          utilizing the NRT class within IW.

          Show
          Jason Rutherglen added a comment - TestNRTReaderWithThreads2 fails periodically, it's just another synchronization issue. I added syncing on the merge writer in methods like commitMergedDeletes and commitMerge. Perhaps more of that type of syncing needs to be added. It can take time for these issues to be figured out. There's also remnants of a first attempt at transparently utilizing the NRT class within IW.
          Hide
          Jason Rutherglen added a comment -

          New assertions in NRT.flush are catching the issue that's occurring.

          Show
          Jason Rutherglen added a comment - New assertions in NRT.flush are catching the issue that's occurring.
          Hide
          Michael McCandless added a comment -

          I'll try to have a look at this soon Jason! Sounds like good progress...

          Show
          Michael McCandless added a comment - I'll try to have a look at this soon Jason! Sounds like good progress...
          Hide
          Jason Rutherglen added a comment -

          OK, all the tests pass consistently now.

          I guess the next feature is to have NRT.flush execute in a single background thread rather than block update doc calls.

          Show
          Jason Rutherglen added a comment - OK, all the tests pass consistently now. I guess the next feature is to have NRT.flush execute in a single background thread rather than block update doc calls.
          Hide
          Jason Rutherglen added a comment -

          I turned on assert !sr.hasChanges in readerPool.release, and it fails sometimes. I'm not quite sure why yet.

          Show
          Jason Rutherglen added a comment - I turned on assert !sr.hasChanges in readerPool.release, and it fails sometimes. I'm not quite sure why yet.
          Hide
          Jason Rutherglen added a comment -

          Well, I added the background thread for NRT.flush, however, I've also been debugging this assert !sr.hasChanges issue, which out of 7000 iterations, occurs once, and is fairly minor. Hmm... Apply deletes shouldn't really conflict so I'm hoping this isn't an original bug unrelated to LUCENE-1313.

          Show
          Jason Rutherglen added a comment - Well, I added the background thread for NRT.flush, however, I've also been debugging this assert !sr.hasChanges issue, which out of 7000 iterations, occurs once, and is fairly minor. Hmm... Apply deletes shouldn't really conflict so I'm hoping this isn't an original bug unrelated to LUCENE-1313 .
          Hide
          Jason Rutherglen added a comment -

          This patch includes flushing in a background thread. Some formatting has been cleaned up, javadocs added.

          I ran TestNRTReaderWithThreads2 a couple times for kicks and didn't see the assert sr.hasChanges error. I'll probably focus on adding more stress testing.

          Show
          Jason Rutherglen added a comment - This patch includes flushing in a background thread. Some formatting has been cleaned up, javadocs added. I ran TestNRTReaderWithThreads2 a couple times for kicks and didn't see the assert sr.hasChanges error. I'll probably focus on adding more stress testing.
          Hide
          Jason Rutherglen added a comment -

          I went back to trying to utilize a RAM dir inside of IW. This
          actually works well now, and the code is less intrusive than the
          previous patches. The incoming directory is placed in a
          PrefixSwitchDirectory which accepts indicating whether a segment
          is destined for the ram directory or the primary directory.

          A single internal segment infos collection is used, the first
          half are primary dir segments, the second, ram dir segments.

          The above mentioned changes of course break many unit tests. I'm
          going through and evaluating what do on a case by case basis,
          and am open to suggestions.

          Show
          Jason Rutherglen added a comment - I went back to trying to utilize a RAM dir inside of IW. This actually works well now, and the code is less intrusive than the previous patches. The incoming directory is placed in a PrefixSwitchDirectory which accepts indicating whether a segment is destined for the ram directory or the primary directory. A single internal segment infos collection is used, the first half are primary dir segments, the second, ram dir segments. The above mentioned changes of course break many unit tests. I'm going through and evaluating what do on a case by case basis, and am open to suggestions.
          Hide
          Jingkei Ly added a comment -

          I've just tried applying this patch to my checked-out version of trunk (revision 895585) but it appears that the PrefixSwitchDirectory class is missing - is there another patch that is needed to get this working?

          Show
          Jingkei Ly added a comment - I've just tried applying this patch to my checked-out version of trunk (revision 895585) but it appears that the PrefixSwitchDirectory class is missing - is there another patch that is needed to get this working?
          Hide
          Jason Rutherglen added a comment -

          Jingkei, Sorry, please try this patch...

          Show
          Jason Rutherglen added a comment - Jingkei, Sorry, please try this patch...
          Hide
          Jingkei Ly added a comment -

          Thanks Jason, the patch applies cleanly now.

          I've tested it out but it appears that in my extreme use case (I need to handle a case where 1 document is added to the index and then is made immediately available for searching many times a second) the patch version seems to be about 1/3rd slower than the current NRT implementation in Lucene 3.0, with most of the time lost in searching. I'd expected the RAMDirectory implementation to be faster - is what I'm seeing against your expectations too?

          Show
          Jingkei Ly added a comment - Thanks Jason, the patch applies cleanly now. I've tested it out but it appears that in my extreme use case (I need to handle a case where 1 document is added to the index and then is made immediately available for searching many times a second) the patch version seems to be about 1/3rd slower than the current NRT implementation in Lucene 3.0, with most of the time lost in searching. I'd expected the RAMDirectory implementation to be faster - is what I'm seeing against your expectations too?
          Hide
          Jason Rutherglen added a comment -

          > the patch version seems to be about 1/3rd slower than the current NRT implementation in Lucene 3.0

          How are you measuring the 1/3? Can you post your test? It may be helpful to create a similar test using Lucene's benchmark code.

          > with most of the time lost in searching

          Is it QPS or query times that are going down?

          Show
          Jason Rutherglen added a comment - > the patch version seems to be about 1/3rd slower than the current NRT implementation in Lucene 3.0 How are you measuring the 1/3? Can you post your test? It may be helpful to create a similar test using Lucene's benchmark code. > with most of the time lost in searching Is it QPS or query times that are going down?
          Hide
          Jingkei Ly added a comment -

          Jason,

          I've attached the benchmarking test I've based my observations on. My use case is to add 1 document to the index and to perform a search immediately afterwards and be guaranteed that the document I previously added is available in the search (which it is why it is done synchronously in 1 thread).

          Apologies for the sysouts on which I'm basing my observations on - the main point is that adding a few thousand docs in this fashion appears to be faster in Lucene 3.0 when compared to the equivalent with trunk+patch. The 1/3rd slower observation I made is based on the fact that on my machine (quad-core, 4GB RAM, Windows XP), Lucene 3.0 is able to process 35 docs per second in the fashion I've described, whereas trunk + patch runs at 22 docs per second.

          Having rerun the test, It appears the average time it takes for a query to complete is slower but it's actually the average time it takes to get a new realtime reader that is much slower.

          I haven't ruled out having done something really stupid in my test, in which case I apologise in advance!

          Show
          Jingkei Ly added a comment - Jason, I've attached the benchmarking test I've based my observations on. My use case is to add 1 document to the index and to perform a search immediately afterwards and be guaranteed that the document I previously added is available in the search (which it is why it is done synchronously in 1 thread). Apologies for the sysouts on which I'm basing my observations on - the main point is that adding a few thousand docs in this fashion appears to be faster in Lucene 3.0 when compared to the equivalent with trunk+patch. The 1/3rd slower observation I made is based on the fact that on my machine (quad-core, 4GB RAM, Windows XP), Lucene 3.0 is able to process 35 docs per second in the fashion I've described, whereas trunk + patch runs at 22 docs per second. Having rerun the test, It appears the average time it takes for a query to complete is slower but it's actually the average time it takes to get a new realtime reader that is much slower. I haven't ruled out having done something really stupid in my test, in which case I apologise in advance!
          Hide
          Jason Rutherglen added a comment -

          Lucene 3.0 is able to process 35 docs per second in the
          fashion I've described, whereas trunk + patch runs at 22 docs
          per second.

          I glanced at the benchmark posted. It'll take some digging into,
          I ran tests before like this (add 1 doc, do a search) and
          LUCENE-1313 was faster than trunk/3.0. On Windows XP it was
          definitely faster before (I saw XP perform lots of disk access
          when creating small files) so perhaps there's something else
          going on.

          Show
          Jason Rutherglen added a comment - Lucene 3.0 is able to process 35 docs per second in the fashion I've described, whereas trunk + patch runs at 22 docs per second. I glanced at the benchmark posted. It'll take some digging into, I ran tests before like this (add 1 doc, do a search) and LUCENE-1313 was faster than trunk/3.0. On Windows XP it was definitely faster before (I saw XP perform lots of disk access when creating small files) so perhaps there's something else going on.
          Hide
          Jason Rutherglen added a comment -

          Won't be working on these and they're old

          Show
          Jason Rutherglen added a comment - Won't be working on these and they're old

            People

            • Assignee:
              Unassigned
              Reporter:
              Jason Rutherglen
            • Votes:
              2 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development