Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff from LUCENE-2161...

      The hard throttling CMS does (blocking the incoming thread that wants
      to launch a new merge) can be devastating when it strikes during NRT
      reopen.

      It can easily happen if a huge merge is off and running, but then a
      tiny merge is needed to clean up recently created segments due to
      frequent reopens.

      I think a small change to CMS, whereby it assigns a higher thread
      priority to tiny merges than big merges, should allow us to increase
      the max merge thread count again, and greatly reduce the chance that
      NRT's reopen would hit this.

      1. LUCENE-2164.patch
        16 kB
        Michael McCandless
      2. LUCENE-2164.patch
        5 kB
        Michael McCandless

        Activity

        Hide
        Jason Rutherglen added a comment -

        This issue illustrates why the ram dir approach can be useful,
        because small segment merges compete with large segment merges
        for IO, which can spike the turnaround time. With a ram dir, the
        small segments are held in RAM until they're large enough to be
        placed onto disk. They can then be given the same IO priority as
        the other merging segments which should result in consistent
        reopen times.

        Show
        Jason Rutherglen added a comment - This issue illustrates why the ram dir approach can be useful, because small segment merges compete with large segment merges for IO, which can spike the turnaround time. With a ram dir, the small segments are held in RAM until they're large enough to be placed onto disk. They can then be given the same IO priority as the other merging segments which should result in consistent reopen times.
        Hide
        Michael McCandless added a comment -

        Attached patch.

        Whenever CMS starts a new thread, or an existing thread pulls a new merge to run, it updates the thread priorities (smaller merges get higher thread priority).

        I also increased the default max thread count from 1 to 2.

        Show
        Michael McCandless added a comment - Attached patch. Whenever CMS starts a new thread, or an existing thread pulls a new merge to run, it updates the thread priorities (smaller merges get higher thread priority). I also increased the default max thread count from 1 to 2.
        Hide
        Michael McCandless added a comment -

        This issue illustrates why the ram dir approach can be useful,
        because small segment merges compete with large segment merges
        for IO, which can spike the turnaround time. With a ram dir, the
        small segments are held in RAM until they're large enough to be
        placed onto disk. They can then be given the same IO priority as
        the other merging segments which should result in consistent
        reopen times.Make CMS smarter about thread priorities

        I'm still not convinced we should game the OS, here.

        Ie, the small segments are likely still in the IO cache, so
        we're probably not really competing for IO on the smaller merges.

        I think it's when an immense merge is running that we're in trouble.
        EG I see very long NRT reopen times when such a merge is running.

        I've been wondering whether we need to take IO prioritization into our
        own hands. EG, instead of relying [only] on thread priorities for
        CMS, somehow have the big merge pause until the small merge can
        complete. This would really be the best way to implement the "allow
        at most 1 merge to run at once". I guess we may be able to override
        the mergeAbort to implement this...

        I think a similar "emulate IO prioritization" may help when, eg, a
        large merge/optimize is interfering with ongoing searches. Yes, some
        of the cost is because the merge is evicting IO cache, but for large
        indexes that don't fit in RAM, there must be some cost to sharing the
        disk heads, too...

        Show
        Michael McCandless added a comment - This issue illustrates why the ram dir approach can be useful, because small segment merges compete with large segment merges for IO, which can spike the turnaround time. With a ram dir, the small segments are held in RAM until they're large enough to be placed onto disk. They can then be given the same IO priority as the other merging segments which should result in consistent reopen times.Make CMS smarter about thread priorities I'm still not convinced we should game the OS, here. Ie, the small segments are likely still in the IO cache, so we're probably not really competing for IO on the smaller merges. I think it's when an immense merge is running that we're in trouble. EG I see very long NRT reopen times when such a merge is running. I've been wondering whether we need to take IO prioritization into our own hands. EG, instead of relying [only] on thread priorities for CMS, somehow have the big merge pause until the small merge can complete. This would really be the best way to implement the "allow at most 1 merge to run at once". I guess we may be able to override the mergeAbort to implement this... I think a similar "emulate IO prioritization" may help when, eg, a large merge/optimize is interfering with ongoing searches. Yes, some of the cost is because the merge is evicting IO cache, but for large indexes that don't fit in RAM, there must be some cost to sharing the disk heads, too...
        Hide
        Hoss Man added a comment -

        I also increased the default max thread count from 1 to 2.

        random thought from the peanut gallery: do we want to go down the "Ergonomics" route and make the default number of threads vary based on Runtime.getRuntime().availableProcessors()

        (ie: 1 on a single threaded box, NUM_PROCESSORS/CONSTANT on multithreaded boxes?)

        ?

        Show
        Hoss Man added a comment - I also increased the default max thread count from 1 to 2. random thought from the peanut gallery: do we want to go down the "Ergonomics" route and make the default number of threads vary based on Runtime.getRuntime().availableProcessors() (ie: 1 on a single threaded box, NUM_PROCESSORS/CONSTANT on multithreaded boxes?) ?
        Hide
        Jason Rutherglen added a comment -

        I've been wondering whether we need to take IO
        prioritization into our own hands.

        I think we discussed IO throttling at the directory level
        before. A controller could be passed to createOutput that has a
        setPriority method. The priority level would be set based on the
        demand for IO from IndexWriter. SegmentMerger could know if it's
        creating a large segment from the sizes of the source segments,
        and set the IO priority accordingly?

        I wonder what algorithm would be suitable for this?

        Show
        Jason Rutherglen added a comment - I've been wondering whether we need to take IO prioritization into our own hands. I think we discussed IO throttling at the directory level before. A controller could be passed to createOutput that has a setPriority method. The priority level would be set based on the demand for IO from IndexWriter. SegmentMerger could know if it's creating a large segment from the sizes of the source segments, and set the IO priority accordingly? I wonder what algorithm would be suitable for this?
        Hide
        Michael McCandless added a comment -

        do we want to go down the "Ergonomics" route and make the default number of threads vary based on Runtime.getRuntime().availableProcessors()

        Good idea! How about something like this:

        max(1, min(3, numberOfCores / 2))
        

        Ie, number of cores divided by 2, but floored at 1 and ceiling'd at 3?

        I think 1 BG thread is always good since you gain CPU + IO
        concurrency. Too many thread (I'm guess above 3) will swamp the IO
        system (unless IO system is SSD).

        This would just be a dynamic default, ie, if you call
        setMaxThreadCount yourself, you override it.

        Show
        Michael McCandless added a comment - do we want to go down the "Ergonomics" route and make the default number of threads vary based on Runtime.getRuntime().availableProcessors() Good idea! How about something like this: max(1, min(3, numberOfCores / 2)) Ie, number of cores divided by 2, but floored at 1 and ceiling'd at 3? I think 1 BG thread is always good since you gain CPU + IO concurrency. Too many thread (I'm guess above 3) will swamp the IO system (unless IO system is SSD). This would just be a dynamic default, ie, if you call setMaxThreadCount yourself, you override it.
        Hide
        Michael McCandless added a comment -

        I think we discussed IO throttling at the directory level
        before. A controller could be passed to createOutput that has a
        setPriority method. The priority level would be set based on the
        demand for IO from IndexWriter. SegmentMerger could know if it's
        creating a large segment from the sizes of the source segments,
        and set the IO priority accordingly?

        Yeah we talked about this for the more general case (throttling
        merging when searches are running). That's a pretty big change
        though.

        I think for this case we could handle the throttling entirely in CMS,
        by wrapping the OneMerge and overriding the checkAborted method to
        pause the large merge when smaller merge(s) are running. This would
        force the long merges to stop so that the smaller merges could
        complete without fighting for IO.

        To get a sense of how important this would be, I think we need to
        measure how long it takes a merge to complete when it's alone vs when
        another merge is running at the same time, on a machine with enough
        cores, for a "typical" IO system. If they do cause one another to
        thrash (seems likely, though, I've also seen evidence that merging is
        CPU bound, at least for merging the postings...) then we should only
        allow 1 to run at once...

        Show
        Michael McCandless added a comment - I think we discussed IO throttling at the directory level before. A controller could be passed to createOutput that has a setPriority method. The priority level would be set based on the demand for IO from IndexWriter. SegmentMerger could know if it's creating a large segment from the sizes of the source segments, and set the IO priority accordingly? Yeah we talked about this for the more general case (throttling merging when searches are running). That's a pretty big change though. I think for this case we could handle the throttling entirely in CMS, by wrapping the OneMerge and overriding the checkAborted method to pause the large merge when smaller merge(s) are running. This would force the long merges to stop so that the smaller merges could complete without fighting for IO. To get a sense of how important this would be, I think we need to measure how long it takes a merge to complete when it's alone vs when another merge is running at the same time, on a machine with enough cores, for a "typical" IO system. If they do cause one another to thrash (seems likely, though, I've also seen evidence that merging is CPU bound, at least for merging the postings...) then we should only allow 1 to run at once...
        Hide
        Michael McCandless added a comment -

        Actually, I'd like to explore the throttling, just within CMS, with this issue.

        I have a crude prototype working and it's actually quite simple... I think this is a good way to force the prioritization of different threads.

        Show
        Michael McCandless added a comment - Actually, I'd like to explore the throttling, just within CMS, with this issue. I have a crude prototype working and it's actually quite simple... I think this is a good way to force the prioritization of different threads.
        Hide
        Jason Rutherglen added a comment -

        max(1, min(3, numberOfCores / 2))... This would just be a dynamic default, ie, if you call setMaxThreadCount yourself, you override it.

        This is a good idea as a default.

        Show
        Jason Rutherglen added a comment - max(1, min(3, numberOfCores / 2))... This would just be a dynamic default, ie, if you call setMaxThreadCount yourself, you override it. This is a good idea as a default.
        Hide
        Michael McCandless added a comment -

        Attached patch:

        • Adds new CMS.setMaxMergeCount, which must be greater than
          maxThreadCount, allowing CMS to pause big threads so small threads
          can finish
        • Uses dynamic default for CMS.maxThreadCount, to between 1 & 3
          depending on number of cores; defaults maxMergeCount to that
          number +2

        The pausing works well the NRT stress test – it greatly reduces how
        often the NRT reopen is blocked because of too many merges running.
        It's most important when there is a very big merge running – in that
        case it's better to pause & unpause that big merge when tiny merges
        arrive then to force NRT to wait for the completion of the merge.

        Show
        Michael McCandless added a comment - Attached patch: Adds new CMS.setMaxMergeCount, which must be greater than maxThreadCount, allowing CMS to pause big threads so small threads can finish Uses dynamic default for CMS.maxThreadCount, to between 1 & 3 depending on number of cores; defaults maxMergeCount to that number +2 The pausing works well the NRT stress test – it greatly reduces how often the NRT reopen is blocked because of too many merges running. It's most important when there is a very big merge running – in that case it's better to pause & unpause that big merge when tiny merges arrive then to force NRT to wait for the completion of the merge.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development