Lucene - Core
  1. Lucene - Core
  2. LUCENE-4462

Publishing flushed segments is single threaded and too costly

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA, 4.0-BETA, 4.0
    • Fix Version/s: 4.1, 5.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Spinoff from http://lucene.markmail.org/thread/4li6bbomru35qn7w

      The new TestBagOfPostings failed the build because it timed out after 2 hours ... but in digging I found that it was a starvation issue: the 4 threads were flushing segments much faster than the 1 thread could publish them.

      I think this is because publishing segments (DocumentsWriter.publishFlushedSegment) is actually rather costly (creates CFS file if necessary, writes .si, etc.).

      I committed a workaround for now, to prevent starvation (see svn diff -c 1394704 https://svn.apache.org/repos/asf/lucene/dev/trunk), but we really should address the root cause by moving these costly ops into flush() so that publishing is a low cost operation.

      1. LUCENE-4462.patch
        13 kB
        Simon Willnauer
      2. LUCENE-4462.patch
        13 kB
        Simon Willnauer

        Activity

        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Simon Willnauer
        http://svn.apache.org/viewvc?view=revision&revision=1397237

        LUCENE-4462: Flush Deletes, SegmentInfos and build CFS concurrently in DWPT

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Simon Willnauer http://svn.apache.org/viewvc?view=revision&revision=1397237 LUCENE-4462 : Flush Deletes, SegmentInfos and build CFS concurrently in DWPT
        Hide
        Simon Willnauer added a comment -

        backported to 4x in revision 1397237.

        Show
        Simon Willnauer added a comment - backported to 4x in revision 1397237.
        Hide
        Simon Willnauer added a comment -

        Committed to trunk in revision 1396500

        Show
        Simon Willnauer added a comment - Committed to trunk in revision 1396500
        Hide
        Simon Willnauer added a comment -

        here is a new patch adding back the safety forcePurge. I will commit this to trunk and let it bake in a bit before I backport. I will keep this issue open until it's ported.

        Show
        Simon Willnauer added a comment - here is a new patch adding back the safety forcePurge. I will commit this to trunk and let it bake in a bit before I backport. I will keep this issue open until it's ported.
        Hide
        Simon Willnauer added a comment -

        I think we should keep the safety in there (the fallback to forcePurge if too many segments are backlogged)...? Hopefully it never needs to run... but just in case.

        I agree, I remove it for beasting. I will add back and commit. I will let this bake in a bit and then port to 4.x

        Show
        Simon Willnauer added a comment - I think we should keep the safety in there (the fallback to forcePurge if too many segments are backlogged)...? Hopefully it never needs to run... but just in case. I agree, I remove it for beasting. I will add back and commit. I will let this bake in a bit and then port to 4.x
        Hide
        Michael McCandless added a comment -

        Patch looks good, thanks Simon!

        I think we should keep the safety in there (the fallback to forcePurge if too many segments are backlogged)...? Hopefully it never needs to run... but just in case.

        Show
        Michael McCandless added a comment - Patch looks good, thanks Simon! I think we should keep the safety in there (the fallback to forcePurge if too many segments are backlogged)...? Hopefully it never needs to run... but just in case.
        Hide
        Simon Willnauer added a comment -

        here is a patch that basically moves the prepareFlushedSegment into DWPT and calls it once we are done flushing the segment. The publish call doesn't do any IO anymore which is good and should not be a bottleneck. I could imagine that if somebody uses CFS this could be a perf win too.

        Show
        Simon Willnauer added a comment - here is a patch that basically moves the prepareFlushedSegment into DWPT and calls it once we are done flushing the segment. The publish call doesn't do any IO anymore which is good and should not be a bottleneck. I could imagine that if somebody uses CFS this could be a perf win too.

          People

          • Assignee:
            Simon Willnauer
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development