[LUCENE-2373] Create a Codec to work with streaming and append-only filesystems - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0-ALPHA
Component/s: core/index
Labels:
None

Lucene Fields:

New

Description

Since early 2.x times Lucene used a skip/seek/write trick to patch the length of the terms dict into a place near the start of the output data file. This however made it impossible to use Lucene with append-only filesystems such as HDFS.

In the post-flex trunk the following code in StandardTermsDictWriter initiates this:

    // Count indexed fields up front
    CodecUtil.writeHeader(out, CODEC_NAME, VERSION_CURRENT); 

    out.writeLong(0);                             // leave space for end index pointer

and completes this in close():

      out.seek(CodecUtil.headerLength(CODEC_NAME));
      out.writeLong(dirStart);

I propose to change this layout so that this pointer is stored simply at the end of the file. It's always 8 bytes long, and we known the final length of the file from Directory, so it's a single additional seek(length - 8) to read it, which is not much considering the benefits.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

appending.patch
05/Jul/10 15:00
51 kB
Andrzej Bialecki
appending.patch
25/Jun/10 23:35
48 kB
Andrzej Bialecki
LUCENE-2372-2.patch
08/Jul/10 12:55
55 kB
Andrzej Bialecki
LUCENE-2373.patch
05/Jul/10 16:18
53 kB
Michael McCandless

Issue Links

is depended upon by

LUCENE-2446 Add checksums to Lucene segment files

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Andrzej Bialecki

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 06/Apr/10 22:56

Updated:: 28/Aug/22 12:23

Resolved:: 09/Jul/10 21:11