Lucene - Core
  1. Lucene - Core
  2. LUCENE-3841

CloseableThreadLocal does not work well with Tomcat thread pooling

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: core/other
    • Labels:
      None
    • Environment:

      Lucene/Tika/Snowball running in a Tomcat web application

    • Lucene Fields:
      New

      Description

      We tracked down a large memory leak (effectively a leak anyway) caused
      by how Analyzer users CloseableThreadLocal.
      CloseableThreadLocal.hardRefs holds references to Thread objects as
      keys. The problem is that it only frees these references in the set()
      method, and SnowballAnalyzer will only call set() when it is used by a
      NEW thread.

      The problem scenario is as follows:

      The server experiences a spike in usage (say by robots or whatever)
      and many threads are created and referenced by
      CloseableThreadLocal.hardRefs. The server quiesces and lets many of
      these threads expire normally. Now we have a smaller, but adequate
      thread pool. So CloseableThreadLocal.set() may not be called by
      SnowBallAnalyzer (via Analyzer) for a long time. The purge code is
      never called, and these threads along with their thread local storage
      (lucene related or not) is never cleaned up.

      I think calling the purge code in both get() and set() would have
      avoided this problem, but is potentially expensive. Perhaps using
      WeakHashMap instead of HashMap may also have helped. WeakHashMap
      purges on get() and set(). So this might be an efficient way to
      clean up threads in get(), while set() might do the more expensive
      Map.keySet() iteration.

      Our current work around is to not share SnowBallAnalyzer instances
      among HTTP searcher threads. We open and close one on every request.

      Thanks,
      Matt

      1. LUCENE-3841.patch
        10 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Thanks Matthew!

        Show
        Michael McCandless added a comment - Thanks Matthew!
        Hide
        Michael McCandless added a comment -

        I think the analyzers are going to be heavy.

        If we start going down the path of trying to speed up their instantiation time, then I vote to remove reusable tokenstreams completely.

        That is: i don't think we should suffer the 'worst of both worlds'. either we go thru all the effort to make things reusable, or we dont
        and instead worry about instantiation time, etc.

        Sorry, you're right: it's fine for the Analyzer to be heavy/slow to
        init/etc., as long as the TokenStreams then share static state.

        So, what I meant is: are the TokenStreams returned from
        SnowballAnalyzer somehow not sharing static stuff...? Or, why would
        they be so heavy...?

        Show
        Michael McCandless added a comment - I think the analyzers are going to be heavy. If we start going down the path of trying to speed up their instantiation time, then I vote to remove reusable tokenstreams completely. That is: i don't think we should suffer the 'worst of both worlds'. either we go thru all the effort to make things reusable, or we dont and instead worry about instantiation time, etc. Sorry, you're right: it's fine for the Analyzer to be heavy/slow to init/etc., as long as the TokenStreams then share static state. So, what I meant is: are the TokenStreams returned from SnowballAnalyzer somehow not sharing static stuff...? Or, why would they be so heavy...?
        Hide
        Michael McCandless added a comment -

        Patch.

        I cutover to WeakHashMap for the hard refs, and now purge periodically
        from both set and get. The purge frequency is 20X the number of
        threads in the map, so that the amortized cost remains linear.

        I also stopped using CTL in PagedBytes, and added some additional
        tests for it.

        Show
        Michael McCandless added a comment - Patch. I cutover to WeakHashMap for the hard refs, and now purge periodically from both set and get. The purge frequency is 20X the number of threads in the map, so that the amortized cost remains linear. I also stopped using CTL in PagedBytes, and added some additional tests for it.
        Hide
        Robert Muir added a comment -

        Separately: maybe SnowballAnalyzer is too heavy...? Does it have some static data that ought to be loaded once and shared across analyzers... but isn't today?

        I think the analyzers are going to be heavy.

        If we start going down the path of trying to speed up their instantiation time, then I vote to remove reusable tokenstreams completely.

        That is: i don't think we should suffer the 'worst of both worlds'. either we go thru all the effort to make things reusable, or we dont
        and instead worry about instantiation time, etc.

        Show
        Robert Muir added a comment - Separately: maybe SnowballAnalyzer is too heavy...? Does it have some static data that ought to be loaded once and shared across analyzers... but isn't today? I think the analyzers are going to be heavy. If we start going down the path of trying to speed up their instantiation time, then I vote to remove reusable tokenstreams completely. That is: i don't think we should suffer the 'worst of both worlds'. either we go thru all the effort to make things reusable, or we dont and instead worry about instantiation time, etc.
        Hide
        Michael McCandless added a comment -

        I think it should be safe to use a WeakHashMap for the hardRefs instead of HashMap?

        This way, if a thread has finished and its Thread object is otherwise GCable, the entries in hardRefs should be cleared... though, it's not clear to me precisely when they will be cleared. If it's only on future access to the WeakHashMap (get or set), which seems likely because I think WeakHashMap uses a WeakReference for the keys and therefore won't really remove an entry util it's later "touched", then again only on set will the object be cleared and we haven't really improved the situation.

        Matthew, did you try that change, and, did it improve the scenario above?

        Failing that, I think we have to purge it get... maybe we can amortize it (every Nth get, where N is a factor of how many entries are in the map...).

        Also: I don't think PagedBytes should use CloseableThreadLocal... I think it should just new byte[].

        Separately: maybe SnowballAnalyzer is too heavy...? Does it have some static data that ought to be loaded once and shared across analyzers... but isn't today?

        Show
        Michael McCandless added a comment - I think it should be safe to use a WeakHashMap for the hardRefs instead of HashMap? This way, if a thread has finished and its Thread object is otherwise GCable, the entries in hardRefs should be cleared... though, it's not clear to me precisely when they will be cleared. If it's only on future access to the WeakHashMap (get or set), which seems likely because I think WeakHashMap uses a WeakReference for the keys and therefore won't really remove an entry util it's later "touched", then again only on set will the object be cleared and we haven't really improved the situation. Matthew, did you try that change, and, did it improve the scenario above? Failing that, I think we have to purge it get... maybe we can amortize it (every Nth get, where N is a factor of how many entries are in the map...). Also: I don't think PagedBytes should use CloseableThreadLocal... I think it should just new byte[]. Separately: maybe SnowballAnalyzer is too heavy...? Does it have some static data that ought to be loaded once and shared across analyzers... but isn't today?
        Hide
        Eks Dev added a comment -

        This is indeed a problem. Recently we moved to solr on tomcat and we hit it, slightly different form.

        The nature of the problem is in high thread churn on tomcat, and when combined with expensive analyzers it wracks gc() havoc (even without stale ClosableThreadLocals from this issue). We are attacking this problem currently by reducing maxThreads and increasing minSpareThreads (also reducing time to forced thread renew). The goal is to increase life-time of threads, and to contain them to reasonable limits. I would appreciate any tips into this direction.

        The problem with this strategy is if some cheep requests, not really related to your search saturate smallish thread pool... I am looking for a way to define separate thread pools for search/update requests and one for the rest as it does not make sense to have 100 search threads searching lucene on dual core box. Not really experienced with tomcat...

        Of course, keeping Analyzer creation cheep helps(e.g. make expensive, background structures thread-safe that can be shared and only thin analyzer using them). But this is not always easy.

        Just sharing experience here, maybe someone finds it helpful. Hints always welcome

        Show
        Eks Dev added a comment - This is indeed a problem. Recently we moved to solr on tomcat and we hit it, slightly different form. The nature of the problem is in high thread churn on tomcat, and when combined with expensive analyzers it wracks gc() havoc ( even without stale ClosableThreadLocals from this issue ). We are attacking this problem currently by reducing maxThreads and increasing minSpareThreads (also reducing time to forced thread renew). The goal is to increase life-time of threads, and to contain them to reasonable limits. I would appreciate any tips into this direction. The problem with this strategy is if some cheep requests, not really related to your search saturate smallish thread pool... I am looking for a way to define separate thread pools for search/update requests and one for the rest as it does not make sense to have 100 search threads searching lucene on dual core box. Not really experienced with tomcat... Of course, keeping Analyzer creation cheep helps(e.g. make expensive, background structures thread-safe that can be shared and only thin analyzer using them). But this is not always easy. Just sharing experience here, maybe someone finds it helpful. Hints always welcome

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Matthew Bellew
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development