[LUCENE-6161] Applying deletes is sometimes dog slow - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.1, 6.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

I hit this while testing various use cases for ~~LUCENE-6119~~ (adding auto-throttle to ConcurrentMergeScheduler).

When I tested "always call updateDocument" (each add buffers a delete term), with many indexing threads, opening an NRT reader once per second (forcing all deleted terms to be applied), I see that BufferedUpdatesStream.applyDeletes sometimes seems to take a loooong time, e.g.:

BD 0 [2015-01-04 09:31:12.597; Lucene Merge Thread #69]: applyDeletes took 339 msec for 10 segments, 117 deleted docs, 607333 visited terms
BD 0 [2015-01-04 09:31:18.148; Thread-4]: applyDeletes took 5533 msec for 62 segments, 10989 deleted docs, 8517225 visited terms
BD 0 [2015-01-04 09:31:21.463; Lucene Merge Thread #71]: applyDeletes took 1065 msec for 10 segments, 470 deleted docs, 1825649 visited terms
BD 0 [2015-01-04 09:31:26.301; Thread-5]: applyDeletes took 4835 msec for 61 segments, 14676 deleted docs, 9649860 visited terms
BD 0 [2015-01-04 09:31:35.572; Thread-11]: applyDeletes took 6073 msec for 72 segments, 13835 deleted docs, 11865319 visited terms
BD 0 [2015-01-04 09:31:37.604; Lucene Merge Thread #75]: applyDeletes took 251 msec for 10 segments, 58 deleted docs, 240721 visited terms
BD 0 [2015-01-04 09:31:44.641; Thread-11]: applyDeletes took 5956 msec for 64 segments, 15109 deleted docs, 10599034 visited terms
BD 0 [2015-01-04 09:31:47.814; Lucene Merge Thread #77]: applyDeletes took 396 msec for 10 segments, 137 deleted docs, 719914 visit

What this means is even though I want an NRT reader every second, often I don't get one for up to ~7 or more seconds.

This is on an SSD, machine has 48 GB RAM, heap size is only 2 GB. 12 indexing threads.

As hideously complex as this code is, I think there are some inefficiencies, but fixing them could be hard / make code even hairier ...

Also, this code is mega-locked: holds IW's lock, holds BD's lock. It blocks things like merges kicking off or finishing...

E.g., we pull the MergedIterator many times on the same set of sub-iterators. Maybe we can create the sorted terms up front and reuse that?

Maybe we should go "term stride" (one term visits all N segments) not "segment stride" (visit each segment, iterating all deleted terms for it). Just iterating the terms to be deleted takes a sizable part of the time, and we now do that once for every segment in the index.

Also, the "isUnique" bit in LUCENE-6005 should help here, since if we know the field is unique, we can stop seekExact once we found a segment that has the deleted term, we can maybe pass false for removeDuplicates to MergedIterator...

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-6161.patch
21/Jan/15 13:57
66 kB
Michael McCandless
LUCENE-6161.patch
17/Jan/15 10:33
72 kB
Michael McCandless
LUCENE-6161.patch
12/Jan/15 18:18
28 kB
Michael McCandless
LUCENE-6161.patch
08/Jan/15 00:42
27 kB
Michael McCandless
LUCENE-6161.patch
06/Jan/15 10:01
18 kB
Michael McCandless

Issue Links

is related to

SOLR-6838 Bulk loading with the default of updateDocument blocks all indexing for long periods of time.

Open

Activity

People

Assignee:: Michael McCandless

Reporter:: Michael McCandless

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 04/Jan/15 14:51

Updated:: 28/Aug/22 14:22

Resolved:: 22/Jan/15 15:29