Issue Details (XML | Word | Printable)

Key: LUCENE-1313
Type: New Feature New Feature
Status: Open Open
Priority: Minor Minor
Assignee: Unassigned
Reporter: Jason Rutherglen
Votes: 1
Watchers: 17
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

Near Realtime Search (using a built in RAMDirectory)

Created: 22/Jun/08 05:02 PM   Updated: 24/Nov/09 11:53 PM
Return to search
Component/s: Index
Affects Version/s: 2.4.1
Fix Version/s: 3.1

Time Tracking:
Not Specified

File Attachments:
  Size
Java Archive File Licensed for inclusion in ASF works LUCENE-1313.jar 2009-04-08 12:22 AM Jason Rutherglen 5 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-11-24 11:53 PM Jason Rutherglen 35 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-11-09 08:47 PM Jason Rutherglen 60 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-11-05 09:43 PM Jason Rutherglen 39 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-11-05 07:33 PM Jason Rutherglen 37 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-11-05 06:34 AM Jason Rutherglen 36 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-11-05 02:15 AM Jason Rutherglen 22 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-11-05 02:05 AM Jason Rutherglen 22 kB
Text File LUCENE-1313.patch 2009-11-04 07:51 PM Jason Rutherglen 14 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-11-03 12:37 AM Jason Rutherglen 36 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-11-02 07:04 PM Jason Rutherglen 33 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-09-23 02:11 AM Jason Rutherglen 175 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-09-23 02:03 AM Jason Rutherglen 166 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-06-30 09:28 PM Jason Rutherglen 175 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-06-22 10:16 PM Jason Rutherglen 175 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-06-18 10:52 PM Jason Rutherglen 173 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-06-18 12:55 AM Jason Rutherglen 170 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-06-05 04:44 AM Jason Rutherglen 131 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-05-19 09:59 PM Jason Rutherglen 129 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-05-12 03:20 AM Jason Rutherglen 109 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-05-05 12:32 AM Jason Rutherglen 69 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-05-01 06:38 PM Jason Rutherglen 47 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-04-30 11:19 PM Jason Rutherglen 52 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-04-30 09:59 PM Jason Rutherglen 49 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-04-30 08:25 PM Jason Rutherglen 37 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-04-20 09:09 PM Jason Rutherglen 21 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-04-17 08:27 PM Jason Rutherglen 14 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2009-04-01 10:48 PM Jason Rutherglen 12 kB
Text File Licensed for inclusion in ASF works LUCENE-1313.patch 2008-10-01 06:08 PM Jason Rutherglen 473 kB
Text File Licensed for inclusion in ASF works lucene-1313.patch 2008-07-17 02:35 PM Jason Rutherglen 474 kB
Text File Licensed for inclusion in ASF works lucene-1313.patch 2008-06-24 09:31 PM Jason Rutherglen 698 kB
Text File Licensed for inclusion in ASF works lucene-1313.patch 2008-06-24 12:21 AM Jason Rutherglen 680 kB
Text File Licensed for inclusion in ASF works lucene-1313.patch 2008-06-22 05:06 PM Jason Rutherglen 1.88 MB
Issue Links:
Blocker
 
Dependants
 
Reference
 
dependent
 

Lucene Fields: Patch Available, New


 Description  « Hide
Enable near realtime search in Lucene without external
dependencies. When RAM NRT is enabled, the implementation adds a
RAMDirectory to IndexWriter. Flushes go to the ramdir unless
there is no available space. Merges are completed in the ram
dir until there is no more available ram.

IW.optimize and IW.commit flush the ramdir to the primary
directory, all other operations try to keep segments in ram
until there is no more space.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Jason Rutherglen made changes - 22/Jun/08 05:06 PM
Field Original Value New Value
Attachment lucene-1313.patch [ 12384453 ]
Jason Rutherglen made changes - 24/Jun/08 12:21 AM
Attachment lucene-1313.patch [ 12384545 ]
Jason Rutherglen made changes - 24/Jun/08 09:31 PM
Attachment lucene-1313.patch [ 12384629 ]
Jason Rutherglen made changes - 17/Jul/08 02:35 PM
Attachment lucene-1313.patch [ 12386303 ]
Jason Rutherglen made changes - 01/Oct/08 06:08 PM
Attachment LUCENE-1313.patch [ 12391299 ]
Jason Rutherglen made changes - 01/Apr/09 09:06 PM
Component/s contrib/* [ 12312028 ]
Component/s Index [ 12310232 ]
Fix Version/s 2.9 [ 12312682 ]
Priority Major [ 3 ] Minor [ 4 ]
Description Provides realtime search using Lucene. Conceptually, updates are divided into discrete transactions. The transaction is recorded to a transaction log which is similar to the mysql bin log. Deletes from the transaction are made to the existing indexes. Document additions are made to an in memory InstantiatedIndex. The transaction is then complete. After each transaction TransactionSystem.getSearcher() may be called which allows searching over the index including the latest transaction.

TransactionSystem is the main class. Methods similar to IndexWriter are provided for updating. getSearcher returns a Searcher class.

- getSearcher()
- addDocument(Document document)
- addDocument(Document document, Analyzer analyzer)
- updateDocument(Term term, Document document)
- updateDocument(Term term, Document document, Analyzer analyzer)
- deleteDocument(Term term)
- deleteDocument(Query query)
- commitTransaction(List<Document> documents, Analyzer analyzer, List<Term> deleteByTerms, List<Query> deleteByQueries)

Sample code:

{code}
// setup
FSDirectoryMap directoryMap = new FSDirectoryMap(new File("/testocean"), "log");
LogDirectory logDirectory = directoryMap.getLogDirectory();
TransactionLog transactionLog = new TransactionLog(logDirectory);
TransactionSystem system = new TransactionSystem(transactionLog, new SimpleAnalyzer(), directoryMap);

// transaction
Document d = new Document();
d.add(new Field("contents", "hello world", Field.Store.YES, Field.Index.TOKENIZED));
system.addDocument(d);

// search
OceanSearcher searcher = system.getSearcher();
ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs;
System.out.println(hits.length + " total results");
for (int i = 0; i < hits.length && i < 10; i++) {
  Document d = searcher.doc(hits[i].doc);
  System.out.println(i + " " + hits[i].score+ " " + d.get("contents");
}
{code}

There is a test class org.apache.lucene.ocean.TestSearch that was used for basic testing.

A sample disk directory structure is as follows:

|/snapshot_105_00.xml | XML file containing which indexes and their generation numbers correspond to a snapshot. Each transaction creates a new snapshot file. In this file the 105 is the snapshotid, also known as the transactionid. The 00 is the minor version of the snapshot corresponding to a merge. A merge is a minor snapshot version because the data does not change, only the underlying structure of the index|
|/3 | Directory containing an on disk Lucene index|
|/log | Directory containing log files|
|/log/log00000001.bin | Log file. As new log files are created the suffix number is incremented|

Realtime search with transactional semantics.

Possible future directions:
  * Optimistic concurrency
  * Replication

Encoding each transaction into a set of bytes by writing to a RAMDirectory enables replication. It is difficult to replicate using other methods because while the document may easily be serialized, the analyzer cannot.

I think this issue can hold realtime benchmarks which include indexing and searching concurrently.
Affects Version/s 2.4.1 [ 12313516 ]
Summary Ocean Realtime Search Realtime Search
Jason Rutherglen made changes - 01/Apr/09 10:48 PM
Attachment LUCENE-1313.patch [ 12404393 ]
Jason Rutherglen made changes - 08/Apr/09 12:22 AM
Attachment LUCENE-1313.jar [ 12404907 ]
Jason Rutherglen made changes - 17/Apr/09 08:27 PM
Attachment LUCENE-1313.patch [ 12405804 ]
Jason Rutherglen made changes - 20/Apr/09 09:09 PM
Attachment LUCENE-1313.patch [ 12405961 ]
Yonik Seeley made changes - 27/Apr/09 08:49 PM
Link This issue depends upon LUCENE-1618 [ LUCENE-1618 ]
Jason Rutherglen made changes - 30/Apr/09 08:25 PM
Attachment LUCENE-1313.patch [ 12406953 ]
Jason Rutherglen made changes - 30/Apr/09 09:59 PM
Attachment LUCENE-1313.patch [ 12406961 ]
Jason Rutherglen made changes - 30/Apr/09 11:19 PM
Attachment LUCENE-1313.patch [ 12406973 ]
Jason Rutherglen made changes - 01/May/09 06:38 PM
Attachment LUCENE-1313.patch [ 12407030 ]
Jason Rutherglen made changes - 05/May/09 12:32 AM
Attachment LUCENE-1313.patch [ 12407201 ]
Jason Rutherglen made changes - 12/May/09 03:20 AM
Attachment LUCENE-1313.patch [ 12407841 ]
Jason Rutherglen made changes - 19/May/09 09:59 PM
Attachment LUCENE-1313.patch [ 12408521 ]
Jason Rutherglen made changes - 20/May/09 04:31 AM
Description Realtime search with transactional semantics.

Possible future directions:
  * Optimistic concurrency
  * Replication

Encoding each transaction into a set of bytes by writing to a RAMDirectory enables replication. It is difficult to replicate using other methods because while the document may easily be serialized, the analyzer cannot.

I think this issue can hold realtime benchmarks which include indexing and searching concurrently.
Enable near realtime search in Lucene without external
dependencies. When RAM NRT is enabled, the implementation adds a
RAMDirectory to IndexWriter. Flushes go to the ramdir unless
there is no available space. Merges are completed in the ram
dir until there is no more available ram.

IW.optimize and IW.commit flush the ramdir to the primary
directory, all other operations try to keep segments in ram
until there is no more space.
Jason Rutherglen made changes - 28/May/09 07:53 PM
Link This issue blocks LUCENE-1667 [ LUCENE-1667 ]
Jason Rutherglen made changes - 05/Jun/09 04:44 AM
Attachment LUCENE-1313.patch [ 12409931 ]
Michael McCandless made changes - 15/Jun/09 03:35 PM
Fix Version/s 2.9 [ 12312682 ]
Fix Version/s 3.1 [ 12314025 ]
Jason Rutherglen made changes - 15/Jun/09 05:45 PM
Summary Realtime Search Near Realtime Search
Jason Rutherglen made changes - 18/Jun/09 12:55 AM
Attachment LUCENE-1313.patch [ 12411017 ]
Jason Rutherglen made changes - 18/Jun/09 10:52 PM
Attachment LUCENE-1313.patch [ 12411152 ]
Jason Rutherglen made changes - 22/Jun/09 10:16 PM
Attachment LUCENE-1313.patch [ 12411466 ]
Jason Rutherglen made changes - 30/Jun/09 09:28 PM
Attachment LUCENE-1313.patch [ 12412209 ]
Jason Rutherglen made changes - 14/Jul/09 06:38 PM
Summary Near Realtime Search Near Realtime Search (using a built in RAMDirectory)
Jason Rutherglen made changes - 15/Jul/09 12:50 AM
Link This issue blocks LUCENE-1738 [ LUCENE-1738 ]
Jason Rutherglen made changes - 15/Jul/09 12:50 AM
Link This issue blocks LUCENE-1738 [ LUCENE-1738 ]
Jason Rutherglen made changes - 15/Jul/09 01:21 AM
Link This issue blocks SOLR-1278 [ SOLR-1278 ]
Jason Rutherglen made changes - 28/Aug/09 05:24 PM
Link This issue relates to LUCENE-1577 [ LUCENE-1577 ]
Jason Rutherglen made changes - 23/Sep/09 02:03 AM
Attachment LUCENE-1313.patch [ 12420342 ]
Jason Rutherglen made changes - 23/Sep/09 02:11 AM
Attachment LUCENE-1313.patch [ 12420343 ]
Jason Rutherglen made changes - 02/Nov/09 07:04 PM
Attachment LUCENE-1313.patch [ 12423842 ]
Jason Rutherglen made changes - 03/Nov/09 12:37 AM
Attachment LUCENE-1313.patch [ 12423869 ]
Jason Rutherglen made changes - 04/Nov/09 07:51 PM
Attachment LUCENE-1313.patch [ 12424050 ]
Jason Rutherglen made changes - 05/Nov/09 02:05 AM
Attachment LUCENE-1313.patch [ 12424086 ]
Jason Rutherglen made changes - 05/Nov/09 02:15 AM
Attachment LUCENE-1313.patch [ 12424087 ]
Jason Rutherglen made changes - 05/Nov/09 06:34 AM
Attachment LUCENE-1313.patch [ 12424106 ]
Jason Rutherglen made changes - 05/Nov/09 07:33 PM
Attachment LUCENE-1313.patch [ 12424147 ]
Jason Rutherglen made changes - 05/Nov/09 09:43 PM
Attachment LUCENE-1313.patch [ 12424156 ]
Jason Rutherglen made changes - 09/Nov/09 08:47 PM
Attachment LUCENE-1313.patch [ 12424392 ]
Jason Rutherglen made changes - 24/Nov/09 11:53 PM
Attachment LUCENE-1313.patch [ 12426035 ]