[LUCENE-2425] An Anti-Merging Multi-Directory Indexing Framework - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.0.1
Fix Version/s: None
Component/s: core/index, modules/other
Labels:
- lucene
- policy
- split

Lucene Fields:

New, Patch Available

Description

By design, a Lucene index tends to merge documents that span multiple segments into fewer segments, in order to optimize its directory structure, which in turn leads to better search performance. In particular, it relies on a merge policy to specify the set of merge operations that should be performed when the index is optimized.

Often times, there's a need to do the exact opposite, which is to "split" the documents. This calls for a mechanism that facilitates sub-division of documents based on a certain (ideally, user-defined) algorithm. By way of example, one may wish to sub-divide (or partition) documents based on parameters such as time, space, real-timeliness, and so on. Herein, we describe an indexing framework that builds on the Lucene index writer and reader, to address use cases wherein documents need to diverge rather than converge.

In brief, it associates zero or more sub-directories with the index's directory, which serve to complement it in some manner. The sub-directories (a.k.a. splits) are managed by a split policy, which is notified of all changes made to the index directory (a.k.a. super-directory), thus allowing it to modify its sub-directories as it sees fit. To make the index reader and writer "observable", we extend Lucene's reader and writer with the goal of providing hooks into every method that could potentially change the index. This allows for propagation of such changes to the split policy, which essentially acts as a listener on the index.

We refer to each sub-directory (or split) and the super-directory as a sub-index of the containing index (a.k.a. the split index). Note that the sub-directory may not necessarily be co-located with the super-directory. Furthermore, the split policy in turn relies on one or more split rules to determine when to add or remove sub-directories. This allows for a clear separation of the event that triggers a split from the management of those splits.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-2425.patch
02/May/10 15:33
112 kB
Karthick Sankarachary

Issue Links

is required by

LUCENE-2429 A Rotating Split Policy For Managing Bounded Indices

Open

LUCENE-2430 An Archiving Split Policy For Managing Non-Searchable Documents

Open

LUCENE-2431 A Real-Time Split Policy For Searching In Real-Time

Open

LUCENE-2432 A Caching Split Policy For Real-Time Index Caching

Open

LUCENE-2433 A Remoting Split Policy For Managing Multiple Remote Directories

Open

LUCENE-2434 A Mirroring Split Policy For Load-Balancing Search Requests

Open

LUCENE-2435 A Sharding Split Policy For Load-Sharing Index Writes

Open

(2 is required by)

Activity

People

Assignee:: Unassigned

Reporter:: Karthick Sankarachary

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 01/May/10 21:24

Updated:: 28/Aug/22 12:25