Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1526

For near real-time search, use paged copy-on-write BitVector impl

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • 2.4
    • None
    • core/index
    • None
    • New

    Description

      SegmentReader currently uses a BitVector to represent deleted docs.
      When performing rapid clone (see LUCENE-1314) and delete operations,
      performing a copy on write of the BitVector can become costly because
      the entire underlying byte array must be created and copied. A way to
      make this clone delete process faster is to implement tombstones, a
      term coined by Marvin Humphrey. Tombstones represent new deletions
      plus the incremental deletions from previously reopened readers in
      the current reader.

      The proposed implementation of tombstones is to accumulate deletions
      into an int array represented as a DocIdSet. With LUCENE-1476,
      SegmentTermDocs iterates over deleted docs using a DocIdSet rather
      than accessing the BitVector by calling get. This allows a BitVector
      and a set of tombstones to by ANDed together as the current reader's
      delete docs.

      A tombstone merge policy needs to be defined to determine when to
      merge tombstone DocIdSets into a new deleted docs BitVector as too
      many tombstones would eventually be detrimental to performance. A
      probable implementation will merge tombstones based on the number of
      tombstones and the total number of documents in the tombstones. The
      merge policy may be set in the clone/reopen methods or on the
      IndexReader.

      Attachments

        1. LUCENE-1526.patch
          28 kB
          Jason Rutherglen
        2. LUCENE-1526.patch
          18 kB
          Jason Rutherglen

        Activity

          People

            Unassigned Unassigned
            jasonrutherglen Jason Rutherglen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 168h
                168h
                Remaining:
                Remaining Estimate - 168h
                168h
                Logged:
                Time Spent - Not Specified
                Not Specified