[LUCENE-8438] RAMDirectory speed improvements and cleanup - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Reopened
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 9.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

RAMDirectory screams for a cleanup. It is used and abused in many places and even if we discourage its use in favor of native (mmapped) buffers, there seem to be benefits of keeping RAMDirectory available (quick throw-away indexes without the need to setup external tmpfs, for example).

Currently RAMDirectory performs very poorly under concurrent loads. The implementation is also open for all sorts of abuses – the streams can be reset and are used all around the place as temporary buffers, even without the presence of RAMDirectory itself. This complicates the implementation and is pretty confusing.

An example of how dramatically slow RAMDirectory is under concurrent load, consider this PoC pseudo-benchmark. It creates a single monolithic segment with 500K very short documents (single field, with norms). The index is ~60MB once created. We then run semi-complex Boolean queries on top of that index from N concurrent threads. The attached capture-4 shows the result (queries per second over 5-second spans) for a varying number of concurrent threads on an AWS machine with 32 CPUs available (of which it seems 16 seem to be real, 16 hyper-threaded). That red line at the bottom (which drops compared to a single-threaded performance) is the current RAMDirectory. RAMDirectory2 is an alternative implementation I wrote that uses ByteBuffers. Yes, it's slower than the native mmapped implementation, but a lot faster then the current RAMDirectory (and more GC-friendly because it uses dynamic progressive block scaling internally).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

capture-1.png
07/Aug/18 10:26
118 kB
Dawid Weiss
capture-4.png
31/Jul/18 12:19
58 kB
Dawid Weiss

Issue Links

is a parent of

SOLR-12861 Add Solr factory for new ByteBuffersDirectory

Closed

relates to

LUCENE-8406 Make ByteBufferIndexInput public

Resolved

links to

GitHub Pull Request #432

Sub-Tasks

RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated

Resolved

Dawid Weiss

A ByteBuffer based Directory implementation (and associated classes)

Closed

Dawid Weiss

Remove deprecated RAMDirectory

Closed

Dawid Weiss

100%

Activity

People

Assignee:: Dawid Weiss

Reporter:: Dawid Weiss

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 31/Jul/18 12:33

Updated:: 28/Nov/24 10:48

Resolved:: 28/Jan/19 12:52

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

2h 20m

Include sub-tasks