[LUCENE-1292] Tag Index - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: 2.3.1
Fix Version/s: None
Component/s: core/index
Labels:
None

Lucene Fields:

New

Description

The problem the tag index solves is slow field cache loading and range queries, and reindexing an entire document to update fields that are not tokenized.

The tag index holds untokenized terms with a docfreq of 1 in a term dictionary like index file. The file also stores the docs per term, similar to ~~LUCENE-1278~~. The index also has a transaction log and in memory index for realtime updates to the tags. The transaction log is periodically merged into the existing tag term dictionary index file.

The TagIndexReader extends IndexReader and is unified with a regular index by ParallelReader. There is a doc id to terms skip pointer file for the IndexReader.document method. This file contains a pointer for looking up the terms for a document.

There is a higher level class that encapsulates writing a document with tag fields to IndexWriter and TagIndexWriter. This requires a hook into IndexWriter to coordinate doc ids and flushing segments to disk.

The writer class could be as simple as:

public class TagIndexWriter {
  
  public void add(Term term, DocIdSetIterator iterator) {
  }
  
  public void delete(Term term, DocIdSetIterator iterator) {
  }
}

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

lucene-1292.patch
07/Jun/08 17:29
108 kB
Jason Rutherglen

Activity

People

Assignee:: Unassigned

Reporter:: Jason Rutherglen

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 21/May/08 14:06

Updated:: 28/Aug/22 11:50

Resolved:: 24/Jan/11 21:16