Issue Details (XML | Word | Printable)

Key: LUCENE-550
Type: New Feature New Feature
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Grant Ingersoll
Reporter: Karl Wettin
Votes: 1
Watchers: 4
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

InstantiatedIndex - faster but memory consuming index

Created: 20/Apr/06 12:46 PM   Updated: 13/Mar/08 12:33 PM
Return to search
Component/s: Store
Affects Version/s: 2.0.0
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Java Source File Licensed for inclusion in ASF works BinarySearchUtils.Apache.java 2008-02-25 06:38 PM Olivier Chafik 10 kB
Text File Licensed for inclusion in ASF works LUCENE-550.patch 2008-03-09 04:44 AM Karl Wettin 113 kB
Text File Licensed for inclusion in ASF works LUCENE-550.patch 2008-03-09 03:09 AM Karl Wettin 112 kB
Text File Licensed for inclusion in ASF works LUCENE-550.patch 2008-03-08 11:36 PM Grant Ingersoll 112 kB
Text File Licensed for inclusion in ASF works LUCENE-550_20071021_no_core_changes.txt 2007-10-21 03:44 PM Karl Wettin 109 kB
Zip Archive Licensed for inclusion in ASF works test-reports.zip 2007-01-15 09:31 AM Hoss Man 90 kB
Image Attachments:

1. classdiagram.png
(61 kB)

2. HitCollectionBench.jpg
(156 kB)
Issue Links:
Incorporates
 

Lucene Fields: Patch Available
Resolution Date: 13/Mar/08 12:33 PM


 Description  « Hide
Represented as a coupled graph of class instances, this all-in-memory index store implementation delivers search results up to a 100 times faster than the file-centric RAMDirectory at the cost of greater RAM consumption.

Performance seems to be a little bit better than log2n (binary search). No real data on that, just my eyes.

Populated with a single document InstantiatedIndex is almost, but not quite, as fast as MemoryIndex.

At 20,000 document 10-50 characters long InstantiatedIndex outperforms RAMDirectory some 30x,
15x at 100 documents of 2000 charachters length,
and is linear to RAMDirectory at 10,000 documents of 2000 characters length.

Mileage may vary depending on term saturation.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Karl Wettin made changes - 20/Apr/06 12:47 PM
Field Original Value New Value
Attachment InstanciatedIndex.java [ 12325589 ]
Karl Wettin made changes - 20/Apr/06 12:47 PM
Attachment Document.java [ 12325590 ]
Karl Wettin made changes - 20/Apr/06 12:48 PM
Attachment Term.java [ 12325591 ]
Karl Wettin made changes - 21/Apr/06 07:36 AM
Attachment src.tar.gz [ 12325644 ]
Karl Wettin made changes - 21/Apr/06 07:37 AM
Attachment class_diagram.png [ 12325645 ]
Karl Wettin made changes - 22/Apr/06 04:11 AM
Attachment class_diagram.png [ 12325693 ]
Karl Wettin made changes - 10/May/06 04:46 AM
Attachment src_20060509.tar.gz [ 12326477 ]
Karl Wettin made changes - 11/May/06 10:32 PM
Attachment src-1.9karl1_20060611.tar.gz [ 12326568 ]
Karl Wettin made changes - 12/May/06 04:21 AM
Attachment lucene.1.9-karl1.jpg [ 12326582 ]
Karl Wettin made changes - 27/May/06 06:42 PM
Attachment instanciated_20060527.tar [ 12334653 ]
Otis Gospodnetic made changes - 29/May/06 11:02 AM
Attachment Document.java [ 12325590 ]
Otis Gospodnetic made changes - 29/May/06 11:02 AM
Attachment InstanciatedIndex.java [ 12325589 ]
Otis Gospodnetic made changes - 29/May/06 11:02 AM
Attachment Term.java [ 12325591 ]
Otis Gospodnetic made changes - 29/May/06 11:03 AM
Attachment src-1.9karl1_20060611.tar.gz [ 12326568 ]
Otis Gospodnetic made changes - 29/May/06 11:03 AM
Attachment src_20060509.tar.gz [ 12326477 ]
Otis Gospodnetic made changes - 29/May/06 11:03 AM
Attachment src.tar.gz [ 12325644 ]
Karl Wettin made changes - 14/Jun/06 02:35 PM
Attachment InstanciatedIndexTermEnum.java [ 12335421 ]
Karl Wettin made changes - 22/Jul/06 07:50 PM
Attachment lucene2-karl_20060722.tar.gz [ 12337349 ]
Karl Wettin made changes - 23/Jul/06 01:59 PM
Attachment lucene2-karl_20060723.tar.gz [ 12337364 ]
Hoss Man made changes - 23/Jul/06 08:41 PM
Link This issue incorporates LUCENE-581 [ LUCENE-581 ]
Karl Wettin made changes - 22/Nov/06 01:22 PM
Attachment lucene2karl-061122.tar.gz [ 12345480 ]
Karl Wettin made changes - 22/Nov/06 03:14 PM
Description After fixing the bugs, it's now 4.5 -> 5 times the speed. This is true for both at index and query time. Sorry if I got your hopes up too much. There are still things to be done though. Might not have time to do anything with this until next month, so here is the code if anyone wants a peek.

Not good enough for Jira yet, but if someone wants to fool around with it, here it is. The implementation passes a TermEnum -> TermDocs -> Fields -> TermVector comparation against the same data in a Directory.

When it comes to features, offsets don't exists and positions are stored ugly and has bugs.

You might notice that norms are float[] and not byte[]. That is me who refactored it to see if it would do any good. Bit shifting don't take many ticks, so I might just revert that.

I belive the code is quite self explaining.

InstanciatedIndex ii = ..
ii.new InstanciatedIndexReader();
ii.addDocument(s).. replace IndexWriter for now.
An non file centrinc all in memory index. Consumes some 2x the memory of a RAMDirectory (in a term satured index) but is between 3x-60x faster depending on application and how one counts. Average query is about 8x faster. IndexWriter and IndexModifier have been realized in InterfaceIndexWriter and InterfaceIndexModifier.

InstantiatedIndex is wrapped in a new top layer index facade (class Index) that comes with factory methods for writers, readers and searchers for unison index handeling. There are decorators with notification handling that can be used for automatically syncronizing searchers on updates, et.c.

Index also comes with FS/RAMDirectory implementation.
Summary InstanciatedIndex - faster but memory consuming index InstantiatedIndex - faster but memory consuming index
Assignee Karl Wettin [ karl.wettin ]
Affects Version/s 2.0.0 [ 12310853 ]
Affects Version/s 1.9 [ 12310334 ]
Karl Wettin made changes - 27/Nov/06 04:48 PM
Attachment lucene2-karl_20060723.tar.gz [ 12337364 ]
Karl Wettin made changes - 27/Nov/06 04:48 PM
Attachment lucene2-karl_20060722.tar.gz [ 12337349 ]
Karl Wettin made changes - 27/Nov/06 04:48 PM
Attachment instanciated_20060527.tar [ 12334653 ]
Karl Wettin made changes - 27/Nov/06 04:48 PM
Attachment InstanciatedIndexTermEnum.java [ 12335421 ]
Karl Wettin made changes - 13/Jan/07 12:50 AM
Attachment trunk.diff.bz2 [ 12348879 ]
Karl Wettin made changes - 13/Jan/07 12:51 AM
Attachment lucene2karl-061122.tar.gz [ 12345480 ]
Karl Wettin made changes - 14/Jan/07 04:59 PM
Attachment trunk.diff.bz2 [ 12348919 ]
Karl Wettin made changes - 14/Jan/07 05:00 PM
Attachment trunk.diff.bz2 [ 12348879 ]
Hoss Man made changes - 15/Jan/07 09:30 AM
Attachment test-reports.zip [ 12348941 ]
Attachment trunk.diff [ 12348942 ]
Hoss Man made changes - 15/Jan/07 09:31 AM
Attachment test-reports.zip [ 12348943 ]
Attachment trunk.diff [ 12348944 ]
Hoss Man made changes - 15/Jan/07 09:32 AM
Attachment test-reports.zip [ 12348941 ]
Hoss Man made changes - 15/Jan/07 09:32 AM
Attachment trunk.diff [ 12348942 ]
Karl Wettin made changes - 21/Jan/07 06:29 PM
Attachment trunk.diff.bz2 [ 12349338 ]
Karl Wettin made changes - 21/Jan/07 06:32 PM
Attachment trunk.diff.bz2 [ 12348919 ]
Karl Wettin made changes - 21/Jan/07 06:33 PM
Attachment class_diagram.png [ 12325693 ]
Karl Wettin made changes - 21/Jan/07 06:33 PM
Attachment class_diagram.png [ 12325645 ]
Karl Wettin made changes - 21/Jan/07 06:33 PM
Attachment lucene.1.9-karl1.jpg [ 12326582 ]
Karl Wettin made changes - 21/Jan/07 06:44 PM
Attachment issue550.jpg [ 12349340 ]
Karl Wettin made changes - 27/Jan/07 04:52 PM
Attachment trunk.diff.bz2 [ 12349735 ]
Karl Wettin made changes - 27/Jan/07 04:53 PM
Attachment trunk.diff [ 12348944 ]
Karl Wettin made changes - 27/Jan/07 04:54 PM
Attachment trunk.diff.bz2 [ 12349338 ]
Karl Wettin made changes - 28/Jan/07 06:22 PM
Attachment lucene-550.jpg [ 12349761 ]
Karl Wettin made changes - 28/Jan/07 06:25 PM
Attachment issue550.jpg [ 12349340 ]
Karl Wettin made changes - 28/Jan/07 06:25 PM
Attachment trunk.diff.bz2 [ 12349735 ]
Karl Wettin made changes - 28/Jan/07 06:34 PM
Attachment trunk.diff.bz2 [ 12349763 ]
Karl Wettin made changes - 03/Feb/07 06:15 PM
Attachment trunk.diff.bz2 [ 12350281 ]
Karl Wettin made changes - 10/Feb/07 09:30 PM
Attachment trunk.diff.bz2 [ 12350839 ]
Karl Wettin made changes - 11/Feb/07 05:21 PM
Attachment trunk.diff.bz2 [ 12350875 ]
Karl Wettin made changes - 17/Feb/07 07:26 AM
Attachment trunk.diff.bz2 [ 12351422 ]
Karl Wettin made changes - 17/Feb/07 07:29 AM
Attachment didyoumean.jpg [ 12351423 ]
Karl Wettin made changes - 17/Feb/07 07:54 AM
Link This issue incorporates LUCENE-626 [ LUCENE-626 ]
Karl Wettin made changes - 17/Feb/07 08:24 AM
Comment [ Package level java doc of the adaptive spell checker
{html}
A dictionary with weighted suggestions,
ordered by user activity,
backed by algorithmic suggestions.
<p/>
<h1>What, where, when and how.</h1>
<h2>Goal trees</h2>
A user session could contain multiple quests for content.
For example,
first the user looks for the Apache licence,
spells it wrong, inspects different results,
and then the user search for the author Ivan Goncharov.
<p/>
In this package we call them different goals.
<p/>
User activities are represented by a tree of QueryGoalNodes,
each describes a user query,
if the it was a suggestion from the system to a previous user query,
what search results was further inspected,
when it happends and for how long.
<p/>
Your biggest task when implementing this package
will be to keep track what goal node the user came from,
so that the new queries will be children to the previous query.
Probably you add it as meta data to all actions from a user,
i.e. in the href, as hidden input, et c,
and keep track of them in a Map&lt;Integer, QueryGoalNode> in the user session.
<p/>
It is up to the QueryGoalTreeExtractor implementations to decide what
events in a session are parts of the same goal,
as we don't want to suggest the user to check out Goncharov
when they are looking for the Apache license.
<p/>
In the default query goal tree extractor,
nodes are parts of the same goal as their parent when:
<ul>
  <li>The queries are the same.</li>
  <li>The user took a suggestion from the system.</li>
  <li>The current and the parent queries are similair enough.</li>
  <li>The queries was entered within short enough time.</li>
</ul>
<p/>

<h2>Adaptive training</h2>
Adaptive means that the suggestions to a query
depends on how users previously have been acting.
This means that the dictionary could be tampered with quite easy
and you should therefor try to train only with data from trusted users.
<p/>
The default trainer implementation works like this:
<ul>
  <li>If a user accepts the suggestion made by the system, then we increase the score for that suggestion. (positive
    adaptation)
  </li>
  <li>If a user does not accept the suggestion made by the system, then we decrease the score for that suggestion.
    (negative adaptation)
  </li>
  <li>
    If the goal tree is a single query, one query only (perhaps with multiple inspections)
    then we adapt negative once again.
  </li>
  <li>
    Suggestions are the queries with inspections, ordered by the classification weight.
    All the queries in the goal witout inspections will be adpated positive with
    the query with inspections that has the shortest edit distance.
  </li>
  <li>Suggests back from best goal to second best goal. homm -> heroes of might and magic -> homm</li>
</ul>
<p/>

<h2>Suggesting</h2>
Suggestions are created by the suggester, that navigates a dictionary.
The default implementation works like this:
<ul>
  <li>
    Returns highest scoring suggestion available,
    unless the score is lower than the suggestion supression threadshold.
  </li>
  <li>
    If there are no suggestions available, the second level suggesters
    registred to the dictionary are used to produce the suggestions.
  </li>
  <li>
    If the top scoring suggestion is same as the query,
    and the second best is not supressed below threadshold,
    change order
  </li>
</ul>
Ignoring a suggestion 50 times or so with a DefaultTrainer makes a score hit 0.05d.
<p/>

<h2>Second level suggestion</h2>
If the dictionary does not contain a suggestion for a given query,
it will be passed on to any available SecondLevelSuggester,
usually an algorithmic suggestion scheme
that hopefully can come up with a suggestion.
As a user accepts such a suggestion it will be trained
and become a part of the adaptive layer.
<h3>Token suggesters</h3>
The lowest level of suggestion is single token suggestions,
and the default implementation is a refactor of the contrib/spellcheck.
<h3>TokenPhraseSuggester</h3>
A layer on top of the single token suggesting that enables muti token (phrase) suggestions.
<p/>
For example, the user places the query "thh best game".
The matrix of similar tokens are:
<pre>
  the best game
  tho rest fame
           lame
</pre>
These can be represented in a finite number of ways:
<pre>
  tho best game
  tho best fame
  tho best lame
  tho rest game
  tho rest fame
  tho rest lame
  the best game
  the best fame
  the best lame
  the rest game
  the rest fame
  the rest lame
</pre>
A query is created for each combination, in the default SpanNearQueries, to find valid suggestions.
<p/>
If any of the valid hits contains a TermPositionVector
it will be analyzed and suggest the query in the order of terms in the index.
E.g. query "camel that broke the staw" is suggested with "straw that broke the camel"
todo: if term positions available and stored, suggest that for cosmetic reasons.)



<h1>Consumer interface example</h1>
Code from the test cases.
<pre>
  private SuggestionFacade&lt;R> suggestionFacade;

  @Override
  protected void setUp() throws Exception {
    suggestionFacade = = new SuggestionFacade&lt;R>();
  }

  public void testBasicTraining() throws Exception {
    QueryGoalNode&lt;R> node;

    node = new QueryGoalNode&lt;R>(null, "heroes of nmight and magic", 3);
    node = new QueryGoalNode&lt;R>(node, "heroes of night and magic", 3);
    node = new QueryGoalNode&lt;R>(node, "heroes of might and magic", 10);
    node.new Inspection(23, QueryGoalNode.GOAL);
    suggestionFacade.queueGoalTree(node.getRoot());

    node = new QueryGoalNode&lt;R>(null, "heroes of night and magic", 3);
    node = new QueryGoalNode&lt;R>(node, "heroes of knight and magic", 7);
    node = new QueryGoalNode&lt;R>(node, "heroes of might and magic", 20);
    node.new Inspection(23, QueryGoalNode.GOAL);
    suggestionFacade.queueGoalTree(node);

    node = new QueryGoalNode&lt;R>(null, "heroes of might and magic", 20, 1l);
    suggestionFacade.queueGoalTree(node);

    node = new QueryGoalNode&lt;R>(null, "heroes of night and magic", 7, 0l);
    node = new QueryGoalNode&lt;R>(node, "heroes of light and magic", 14, 1l);
    node = new QueryGoalNode&lt;R>(node, "heroes of might and magic", 2, 6l);
    node.new Inspection(23, QueryGoalNode.GOAL);
    node.new Inspection(23, QueryGoalNode.GOAL);
    suggestionFacade.queueGoalTree(node);

    node = new QueryGoalNode&lt;R>(null, "heroes of night and magic", 4, 0l);
    node = new QueryGoalNode&lt;R>(node, "heroes of knight and magic", 17, 1l);
    node = new QueryGoalNode&lt;R>(node, "heroes of might and magic", 2, 2l);
    node.new Inspection(23, QueryGoalNode.GOAL);
    suggestionFacade.queueGoalTree(node);

    suggestionFacade.flush();

    assertEquals("heroes of might and magic", suggestionFacade.didYouMean("heroes of light and magic"));
    assertEquals("heroes of might and magic", suggestionFacade.didYouMean("heroes of night and magic"));
    assertEquals("heroes of might and magic", suggestionFacade.didYouMean("heroes ofnight andmagic"));
  }
</pre>
<p/>
Notice the last assertation:
<pre>
  assertEquals("heroes of might and magic", suggestionFacade.didYouMean("heroes ofnight andmagic"));
</pre>
The dictionary will strip keys from puctuation and whitespace,
resulting in better support for de/compositions of words.
<p/>
Above example will be user session analyzing and adaptive only,
no algorithmic suggestions if the user types in something nobody miss spelled before.
Simply add one to the dictionary:
<pre>
  protected void setUp() throws Exception {
      suggestionFacade = new SuggestionFacade&lt;R>();

      // your primary index that suggestions must match.
      IndexFacade aprioriIndex = new IndexFacade(new RAMDirectoryIndex());
      String aprioriField = "title";

      // build the ngram suggester
      IndexFacade ngramIndex = new IndexFacade(new RAMDirectoryIndex());
      NgramTokenSuggester ngramSuggester = new NgramTokenSuggester(ngramIndex);
      ngramSuggester.indexDictionary(new TermEnumIterator(aprioriIndex.getReader(), aprioriField));

      // the greater the better results but with a longer response time.
      int maxSuggestionsPerToken = 3;

      // add ngram suggester wrapped in a single token phrase suggester as second level suggester.
      suggestionFacade.getDictionary().getPrioritesBySecondLevelSuggester().put(new SecondLevelTokenPhraseSuggester(ngramSuggester, aprioriField, false, maxSuggestionsPerToken, new WhitespaceAnalyzer(), aprioriIndex), 1d);
    }
</pre>
{html} ]
Karl Wettin made changes - 19/Feb/07 02:20 PM
Attachment trunk.diff.bz2 [ 12351505 ]
Karl Wettin made changes - 20/Feb/07 08:32 PM
Attachment trunk.diff.bz2 [ 12351627 ]
Karl Wettin made changes - 25/Feb/07 11:37 PM
Attachment trunk.diff.bz2 [ 12352016 ]
Karl Wettin made changes - 03/Mar/07 01:18 PM
Attachment trunk.diff.bz2 [ 12352503 ]
Karl Wettin made changes - 03/Mar/07 07:56 PM
Attachment trunk.diff.bz2 [ 12352513 ]
Karl Wettin made changes - 03/Mar/07 07:57 PM
Attachment didyoumean.jpg [ 12351423 ]
Karl Wettin made changes - 13/Mar/07 02:22 AM
Attachment trunk.diff.bz2 [ 12353154 ]
Karl Wettin made changes - 17/Mar/07 08:11 PM
Attachment HitCollectionBench.jpg [ 12353574 ]
Karl Wettin made changes - 17/Mar/07 08:18 PM
Link This issue incorporates LUCENE-626 [ LUCENE-626 ]
Karl Wettin made changes - 17/Mar/07 08:34 PM
Attachment HitCollectionBench.jpg [ 12353575 ]
Karl Wettin made changes - 17/Mar/07 08:35 PM
Attachment HitCollectionBench.jpg [ 12353574 ]
Karl Wettin made changes - 18/Mar/07 03:50 PM
Attachment HitCollectionBench.jpg [ 12353601 ]
Karl Wettin made changes - 18/Mar/07 03:50 PM
Attachment HitCollectionBench.jpg [ 12353575 ]
Karl Wettin made changes - 04/Aug/07 02:28 PM
Attachment LUCENE-550_20070804_no_core_changes.txt [ 12363164 ]
Grant Ingersoll made changes - 07/Aug/07 10:28 PM
Assignee Karl Wettin [ karl.wettin ] Grant Ingersoll [ gsingers ]
Karl Wettin made changes - 08/Aug/07 09:25 PM
Attachment LUCENE-550_20070808_no_core_changes.txt [ 12363446 ]
Karl Wettin made changes - 17/Aug/07 10:09 PM
Attachment LUCENE-550_20070817_no_core_changes.txt [ 12364065 ]
Karl Wettin made changes - 27/Sep/07 11:49 PM
Attachment LUCENE-550_20070928_no_core_changes.txt [ 12366707 ]
Karl Wettin made changes - 08/Oct/07 12:24 AM
Attachment LUCENE-550_20071008_no_core_changes.txt [ 12367226 ]
Karl Wettin made changes - 17/Oct/07 03:17 AM
Attachment LUCENE-550_20071017_no_core_changes.txt [ 12367846 ]
Karl Wettin made changes - 19/Oct/07 02:00 PM
Attachment LUCENE-550_20071019_no_core_changes.txt [ 12368015 ]
Karl Wettin made changes - 21/Oct/07 03:44 PM
Attachment LUCENE-550_20071021_no_core_changes.txt [ 12368100 ]
Karl Wettin made changes - 23/Oct/07 08:20 PM
Attachment lucene-550.jpg [ 12349761 ]
Karl Wettin made changes - 23/Oct/07 08:20 PM
Attachment LUCENE-550_20070804_no_core_changes.txt [ 12363164 ]
Karl Wettin made changes - 23/Oct/07 08:20 PM
Attachment LUCENE-550_20070808_no_core_changes.txt [ 12363446 ]
Karl Wettin made changes - 23/Oct/07 08:21 PM
Attachment LUCENE-550_20070817_no_core_changes.txt [ 12364065 ]
Karl Wettin made changes - 23/Oct/07 08:21 PM
Attachment LUCENE-550_20070928_no_core_changes.txt [ 12366707 ]
Karl Wettin made changes - 23/Oct/07 08:21 PM
Attachment LUCENE-550_20071008_no_core_changes.txt [ 12367226 ]
Karl Wettin made changes - 23/Oct/07 08:21 PM
Attachment LUCENE-550_20071017_no_core_changes.txt [ 12367846 ]
Karl Wettin made changes - 23/Oct/07 08:21 PM
Attachment LUCENE-550_20071019_no_core_changes.txt [ 12368015 ]
Karl Wettin made changes - 23/Oct/07 08:21 PM
Attachment trunk.diff.bz2 [ 12353154 ]
Karl Wettin made changes - 23/Oct/07 08:21 PM
Attachment trunk.diff.bz2 [ 12352513 ]
Karl Wettin made changes - 23/Oct/07 08:21 PM
Attachment trunk.diff.bz2 [ 12352503 ]
Karl Wettin made changes - 23/Oct/07 08:21 PM
Attachment trunk.diff.bz2 [ 12352016 ]
Karl Wettin made changes - 23/Oct/07 08:22 PM
Attachment trunk.diff.bz2 [ 12351627 ]
Karl Wettin made changes - 23/Oct/07 08:22 PM
Attachment trunk.diff.bz2 [ 12351505 ]
Karl Wettin made changes - 23/Oct/07 08:22 PM
Attachment trunk.diff.bz2 [ 12351422 ]
Karl Wettin made changes - 23/Oct/07 08:22 PM
Attachment trunk.diff.bz2 [ 12350875 ]
Karl Wettin made changes - 23/Oct/07 08:22 PM
Attachment trunk.diff.bz2 [ 12350839 ]
Karl Wettin made changes - 23/Oct/07 08:22 PM
Attachment trunk.diff.bz2 [ 12350281 ]
Karl Wettin made changes - 23/Oct/07 08:22 PM
Attachment trunk.diff.bz2 [ 12349763 ]
Karl Wettin made changes - 23/Oct/07 08:27 PM
Description An non file centrinc all in memory index. Consumes some 2x the memory of a RAMDirectory (in a term satured index) but is between 3x-60x faster depending on application and how one counts. Average query is about 8x faster. IndexWriter and IndexModifier have been realized in InterfaceIndexWriter and InterfaceIndexModifier.

InstantiatedIndex is wrapped in a new top layer index facade (class Index) that comes with factory methods for writers, readers and searchers for unison index handeling. There are decorators with notification handling that can be used for automatically syncronizing searchers on updates, et.c.

Index also comes with FS/RAMDirectory implementation.
Represented as a coupled graph of class instances, this all-in-memory index store implementation delivers search results up to a 100 times faster than the file-centric RAMDirectory at the cost of greater RAM consumption.

Performance seems to be a little bit better than log2n (binary search). No real data on that, just my eyes.

Populated with a single document InstantiatedIndex is almost, but not quite, as fast as MemoryIndex.

At 20,000 document 10-50 characters long InstantiatedIndex outperforms RAMDirectory some 30x,
15x at 100 documents of 2000 charachters length,
and is linear to RAMDirectory at 10,000 documents of 2000 characters length.

Mileage may vary depending on term saturation.


Lucene Fields [Patch Available]
Olivier Chafik made changes - 25/Feb/08 06:38 PM
Attachment BinarySearchUtils.Apache.java [ 12376423 ]
Grant Ingersoll made changes - 08/Mar/08 11:35 PM
Status Open [ 1 ] In Progress [ 3 ]
Grant Ingersoll made changes - 08/Mar/08 11:36 PM
Attachment LUCENE-550.patch [ 12377464 ]
Karl Wettin made changes - 09/Mar/08 03:09 AM
Attachment classdiagram.jpg [ 12377471 ]
Attachment LUCENE-550.patch [ 12377470 ]
Karl Wettin made changes - 09/Mar/08 04:37 AM
Attachment classdiagram.jpg [ 12377471 ]
Karl Wettin made changes - 09/Mar/08 04:44 AM
Attachment LUCENE-550.patch [ 12377473 ]
Attachment classdiagram.png [ 12377472 ]
Grant Ingersoll made changes - 13/Mar/08 12:33 PM
Status In Progress [ 3 ] Resolved [ 5 ]
Resolution Fixed [ 1 ]