Solr
  1. Solr
  2. SOLR-3587

After reloading a SolrCore, the original Analyzer is still used rather than a new one.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-BETA, 5.0
    • Component/s: None
    • Labels:
      None

      Description

      It's a usual practice in Solr to overwrite synonyms/stopwords files and issue a core reload command to force Solr reload these files. We've noticed that this trick is no longer working on trunk.

      I started debugging this problem in Eclipse and I can see that Solr actually re-reads updated synonym file on core reload, but indexing process still uses old reference to Synonym filter / SynonymMap instance.

      When I start Solr initially I can see that my Solr server has 2 instances of SynonymMap (the object that holds actual synonyms data). This is expected as I have 2 SynonymFilters defined in my schema (one field with 2 synonym filters - index + query time)

      After a core reload I see that Solr has 4 instances of this class. So I thought that there could be some leak and issued many (N) core reload commands and expected to see 2*N instances. It didn't happen. I can only see 20 instances of this class.

      This is pretty interesting number as well because, according to thread dump/list, Solr/Jetty has 10 working threads in thread pool (by default for example configs). So I suspect that we cache something in thread local storage and it hits us.

      I looked into the code and found that Lucene caches token streams in ThreadLocal, but I don't know the code enough to state that this is the problem.

      So I took a different approach and found the commit that introduced this bug. I wrote a simple test (shell script) and used git bisect tool to chase this bug.

      ffd9c717448eca895d19be8ee9718bc399ac34a7 is the first bad commit
      commit ffd9c717448eca895d19be8ee9718bc399ac34a7
      Author: Mark Robert Miller <markrmiller@apache.org>
      Date: Thu Jun 30 13:59:59 2011 +0000

      SOLR-2193, SOLR-2565: The default Solr update handler has been improved so
      that it uses fewer locks, keeps the IndexWriter open rather than closing it
      on each commit (ie commits no longer wait for background merges to complete),
      works with SolrCore to provide faster 'soft' commits, and has an improved API
      that requires less instanceof special casing.

      You may now specify a 'soft' commit when committing. This will
      use Lucene's NRT feature to avoid guaranteeing documents are on stable storage in exchange
      for faster reopen times. There is also a new 'soft' autocommit tracker that can be
      configured.

      SolrCores now properly share IndexWriters across SolrCore reloads.

      git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1141542 13f79535-47bb-0310-9956-ffa450edef68

      It looks like sharing IndexWriters across SolrCore reloads could be the root cause, right?

      1. SOLR-3587.patch
        7 kB
        Alexey Serba
      2. SOLR-3587.patch
        23 kB
        Mark Miller
      3. SOLR-3587.patch
        9 kB
        Mark Miller

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          hoss20120711-manual-post-40alpha-change

          Show
          Hoss Man added a comment - hoss20120711-manual-post-40alpha-change
          Hide
          Mark Miller added a comment -

          Committed - thanks all!

          Show
          Mark Miller added a comment - Committed - thanks all!
          Hide
          Mark Miller added a comment -

          Another simple patch with alexey's test (now passes) that simply uses an updated analyzer. I made a new issue for the other IW settings.

          Show
          Mark Miller added a comment - Another simple patch with alexey's test (now passes) that simply uses an updated analyzer. I made a new issue for the other IW settings.
          Hide
          Mark Miller added a comment -

          mentioned patch attached for posterity - I'm going to give a shot at just updating the analyzer now.

          Show
          Mark Miller added a comment - mentioned patch attached for posterity - I'm going to give a shot at just updating the analyzer now.
          Hide
          Hoss Man added a comment -

          ...the analyzer is really the deal killer here...

          yeah, if there is a clean way to make sure we get the analyzer working in 4.0 that should be the main priority.

          if there are other ways to do "live" indexwriter setting changes that can make it into 4.0 great, but worse case scenerio we can always document the caveat (or even have an alternate code path where we compare the settings and if they've changed to a hard close and let some updates fail) ... but that's really secondary.

          the number of people who care about RELOADing cores with low level IW settings modified is probably a lot less then the number of people tweaking schema.xml

          Show
          Hoss Man added a comment - ...the analyzer is really the deal killer here... yeah, if there is a clean way to make sure we get the analyzer working in 4.0 that should be the main priority. if there are other ways to do "live" indexwriter setting changes that can make it into 4.0 great, but worse case scenerio we can always document the caveat (or even have an alternate code path where we compare the settings and if they've changed to a hard close and let some updates fail) ... but that's really secondary. the number of people who care about RELOADing cores with low level IW settings modified is probably a lot less then the number of people tweaking schema.xml
          Hide
          Mark Miller added a comment -

          Ah - still new to IWConfig - I was looking through the live stuff to try and change the analyzer.

          So that will still leave the other settings unchangeable, but its a much cleaner solution, and the analyzer is really the deal killer here, so I'll give that a shot.

          Show
          Mark Miller added a comment - Ah - still new to IWConfig - I was looking through the live stuff to try and change the analyzer. So that will still leave the other settings unchangeable, but its a much cleaner solution, and the analyzer is really the deal killer here, so I'll give that a shot.
          Hide
          Yonik Seeley added a comment -

          Can't we just pass null for the writer analyzer (or a dummy one if it can't be null) and then just use the IW APIs that accept Analyzer?

          Show
          Yonik Seeley added a comment - Can't we just pass null for the writer analyzer (or a dummy one if it can't be null) and then just use the IW APIs that accept Analyzer?
          Hide
          Mark Miller added a comment - - edited

          I have an early hack attempt at a workaround for this. It starts opening a new writer again, but also tracks how many writers are being used by forcing the caller to call a release method. When it's time to open a new writer, we stop giving the writer out, wait for all refs to come in, then open the new writer - then start giving refs to the new writer out.

          Some things that complicate this:

          An NRT reader takes the writer as an argument. I have not dealt with this, and I don't yet know how to do so nicely.

          A couple tests needed to be changed to use a FSDirectory because on rebooting the writer, no existing index was found in the RAMDir.

          Still mostly an exploratory idea/solution.

          I'll post an ugly patch in a while, but headed out for a bit now.

          Show
          Mark Miller added a comment - - edited I have an early hack attempt at a workaround for this. It starts opening a new writer again, but also tracks how many writers are being used by forcing the caller to call a release method. When it's time to open a new writer, we stop giving the writer out, wait for all refs to come in, then open the new writer - then start giving refs to the new writer out. Some things that complicate this: An NRT reader takes the writer as an argument. I have not dealt with this, and I don't yet know how to do so nicely. A couple tests needed to be changed to use a FSDirectory because on rebooting the writer, no existing index was found in the RAMDir. Still mostly an exploratory idea/solution. I'll post an ugly patch in a while, but headed out for a bit now.
          Hide
          Alexey Serba added a comment -

          Added Solr test

          Show
          Alexey Serba added a comment - Added Solr test

            People

            • Assignee:
              Mark Miller
              Reporter:
              Alexey Serba
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development