Lucene - Core
  1. Lucene - Core
  2. LUCENE-1316

Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.3
    • Fix Version/s: 2.9
    • Component/s: core/query/scoring
    • Labels:
      None
    • Environment:

      All

    • Lucene Fields:
      New

      Description

      The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).

      In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.

      By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.

      I modified line 67 in the MatchAllDocsQuery
      FROM:
      if (!reader.isDeleted(id)) {
      TO:
      if (!reader.hasDeletions() || !reader.isDeleted(id)) {

      In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.

      Please consider making this adjustment for a future bug fix release.

      1. MatchAllDocsQuery.java
        4 kB
        Todd Feak
      2. LUCENE-1316.patch
        7 kB
        Jason Rutherglen
      3. LUCENE-1316.patch
        10 kB
        Jason Rutherglen
      4. LUCENE-1316.patch
        17 kB
        Michael McCandless
      5. LUCENE_1316.patch
        6 kB
        Yonik Seeley
      6. LUCENE_1316.patch
        6 kB
        Yonik Seeley
      7. LUCENE_1316.patch
        9 kB
        Yonik Seeley

        Activity

        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12564690 ] jira [ 12585027 ]
        Mark Thomas made changes -
        Workflow jira [ 12434042 ] Default workflow, editable Closed status [ 12564690 ]
        Mark Miller made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Michael McCandless made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Michael McCandless made changes -
        Attachment LUCENE-1316.patch [ 12398682 ]
        Michael McCandless made changes -
        Assignee Michael McCandless [ mikemccand ]
        Jason Rutherglen made changes -
        Attachment LUCENE-1316.patch [ 12398086 ]
        Jason Rutherglen made changes -
        Attachment LUCENE-1316.patch [ 12397819 ]
        Michael McCandless made changes -
        Fix Version/s 2.9 [ 12312682 ]
        Yonik Seeley made changes -
        Attachment LUCENE_1316.patch [ 12384862 ]
        Yonik Seeley made changes -
        Attachment LUCENE_1316.patch [ 12384851 ]
        Yonik Seeley made changes -
        Attachment LUCENE_1316.patch [ 12384773 ]
        Todd Feak made changes -
        Field Original Value New Value
        Attachment MatchAllDocsQuery.java [ 12384679 ]
        Todd Feak created issue -

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Todd Feak
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1h
              1h
              Remaining:
              Remaining Estimate - 1h
              1h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development