Solr
  1. Solr
  2. SOLR-2623

Solr JMX MBeans do not survive core reloads

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4, 1.4.1, 3.1, 3.2
    • Fix Version/s: 3.4, 4.0-ALPHA
    • Component/s: multicore
    • Labels:
      None

      Description

      Solr JMX MBeans do not survive core reloads

      "Steps to reproduce"
      sh> cd example
      sh> vi multicore/core0/conf/solrconfig.xml # enable jmx
      sh> java -Dcom.sun.management.jmxremote -Dsolr.solr.home=multicore -jar start.jar
      sh> echo 'open 8842 # 8842 is java pid
      > domain solr/core0
      > beans
      > ' | java -jar jmxterm-1.0-alpha-4-uber.jar
      ....
      solr/core0:id=core0,type=core
      solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=org.apache.solr.handler.StandardRequestHandler
      solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=standard
      solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=/update
      solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=org.apache.solr.handler.XmlUpdateRequestHandler
      ...
      solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=searcher
      solr/core0:id=org.apache.solr.update.DirectUpdateHandler2,type=updateHandler
      sh> curl 'http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0'
      sh> echo 'open 8842 # 8842 is java pid
      > domain solr/core0
      > beans
      > ' | java -jar jmxterm-1.0-alpha-4-uber.jar
      # there's only one bean left after Solr core reload
      solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=Searcher@2e831a91 main
      

      The root cause of this is Solr core reload behavior:

      1. create new core (which overwrites existing registered MBeans)
      2. register new core and close old one (we remove/un-register MBeans on oldCore.close)

      The correct sequence is:

      1. unregister MBeans from old core
      2. create and register new core
      3. close old core without touching MBeans
      1. SOLR-2623.patch
        4 kB
        Alexey Serba
      2. SOLR-2623.patch
        5 kB
        Alexey Serba
      3. SOLR-2623.patch
        5 kB
        Alexey Serba
      4. SOLR-2623.patch
        8 kB
        Shalin Shekhar Mangar
      5. SOLR-2623.patch
        8 kB
        Shalin Shekhar Mangar
      6. SOLR-2623-branch3x.patch
        5 kB
        Shalin Shekhar Mangar
      7. SOLR-2623-fixtest-branch_3x.patch
        4 kB
        Shalin Shekhar Mangar
      8. SOLR-2623-testfix-trunk.patch
        8 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          bulk close for 3.4

          Show
          Robert Muir added a comment - bulk close for 3.4
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 1148681 on trunk and 1148683 on branch_3x.

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 1148681 on trunk and 1148683 on branch_3x.
          Hide
          Shalin Shekhar Mangar added a comment -

          Fixed tests for trunk

          Show
          Shalin Shekhar Mangar added a comment - Fixed tests for trunk
          Hide
          Shalin Shekhar Mangar added a comment -

          Fixed SolrDynamicMBean.getAttribute to support reading coreHashCode.

          Fixed (and simplified) test to compare registered mbeans with info registry size. Comparing number of mbeans between core reloads is flawed because when tests run in parallel, mbeans from multiple tests can be registered in the same mbean server.

          Show
          Shalin Shekhar Mangar added a comment - Fixed SolrDynamicMBean.getAttribute to support reading coreHashCode. Fixed (and simplified) test to compare registered mbeans with info registry size. Comparing number of mbeans between core reloads is flawed because when tests run in parallel, mbeans from multiple tests can be registered in the same mbean server.
          Hide
          Shalin Shekhar Mangar added a comment -

          The test has problems when run in parallel.
          Also, the code was not merged correctly in branch_3x.

          Show
          Shalin Shekhar Mangar added a comment - The test has problems when run in parallel. Also, the code was not merged correctly in branch_3x.
          Hide
          Steve Rowe added a comment -

          The Maven build problem should be fixed now: LUCENE-3323.

          Show
          Steve Rowe added a comment - The Maven build problem should be fixed now: LUCENE-3323 .
          Hide
          Steve Rowe added a comment -

          the new test has never succeeded under Maven on branch_3x. On trunk, it succeeds.

          It failed today on the Jenkins trunk Maven build, so I guess it's not so strange: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/189/testReport/junit/org.apache.solr.core/TestJmxIntegration/testJmxOnCoreReload/

          I can cause this to fail under IntelliJ if I set the working directory to the location where all the resources are copied to. That is, if SolrTestCaseJ4.getFile() finds solr/conf/ in the current directory, solr home is set to ./solr/solr/.

          Show
          Steve Rowe added a comment - the new test has never succeeded under Maven on branch_3x. On trunk, it succeeds. It failed today on the Jenkins trunk Maven build, so I guess it's not so strange: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/189/testReport/junit/org.apache.solr.core/TestJmxIntegration/testJmxOnCoreReload/ I can cause this to fail under IntelliJ if I set the working directory to the location where all the resources are copied to. That is, if SolrTestCaseJ4.getFile() finds solr/conf/ in the current directory, solr home is set to ./solr/solr/ .
          Hide
          Steve Rowe added a comment - - edited

          The new test you added has never succeeded under Maven

          That's incorrect - the new test has never succeeded under Maven on branch_3x. On trunk, it succeeds. Which makes this failure even stranger...

          Show
          Steve Rowe added a comment - - edited The new test you added has never succeeded under Maven That's incorrect - the new test has never succeeded under Maven on branch_3x . On trunk, it succeeds. Which makes this failure even stranger...
          Hide
          Steve Rowe added a comment -

          Hi Shalin,

          The new test you added has never succeeded under Maven - here's the first failure:

          https://builds.apache.org/job/Lucene-Solr-Maven-3.x/178/testReport/junit/org.apache.solr.core/TestJmxIntegration/testJmxOnCoreReload/

          Unlike the Ant build, the Maven build copies resources over to solr/build/, and for the Solr core+solrj tests (they run together under Maven), this includes core/src/test-files/, test-framework/src/test-files/, and solrj/src/test-files/, so the environment is somewhat different than under Ant.

          I took a quick look at the code and I don't understand what's happening - do you? The file that can't be found solr/solr/conf/solrconfig.xml is instead located at solr/conf/solrconfig.xml, so it looks like an extra solr/ is being prepended for some reason?

          Show
          Steve Rowe added a comment - Hi Shalin, The new test you added has never succeeded under Maven - here's the first failure: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/178/testReport/junit/org.apache.solr.core/TestJmxIntegration/testJmxOnCoreReload/ Unlike the Ant build, the Maven build copies resources over to solr/build/, and for the Solr core+solrj tests (they run together under Maven), this includes core/src/test-files/ , test-framework/src/test-files/ , and solrj/src/test-files/ , so the environment is somewhat different than under Ant. I took a quick look at the code and I don't understand what's happening - do you? The file that can't be found solr/solr/conf/solrconfig.xml is instead located at solr/conf/solrconfig.xml , so it looks like an extra solr/ is being prepended for some reason?
          Hide
          Mark Miller added a comment -

          Is this intentional?

          Its how noble and I first set things up if I remember right - I've wanted to address it since as it has a few not nice side affects - have never gotten around to it though.

          Show
          Mark Miller added a comment - Is this intentional? Its how noble and I first set things up if I remember right - I've wanted to address it since as it has a few not nice side affects - have never gotten around to it though.
          Hide
          Shalin Shekhar Mangar added a comment -

          I've noticed the same problem with JMX domain "solr/" (empty collection name) after core reload, but I believe I fixed that problem in my patch.

          I removed that part from your patch because I was not sure of its consequences. Also on reload, it was going back to an empty string. I used to think that in 4.0, the default core would be registered with both the original name as well as empty string but it is not so. We should fix it but that belongs to a separate issue.

          Show
          Shalin Shekhar Mangar added a comment - I've noticed the same problem with JMX domain "solr/" (empty collection name) after core reload, but I believe I fixed that problem in my patch. I removed that part from your patch because I was not sure of its consequences. Also on reload, it was going back to an empty string. I used to think that in 4.0, the default core would be registered with both the original name as well as empty string but it is not so. We should fix it but that belongs to a separate issue.
          Hide
          Alexey Serba added a comment -

          However, regardless of what name you specify in the solr.xml, the default core's name is always an empty string. Is this intentional?

          Yes, I guess this is part of default core name normalization. I've noticed the same problem with JMX domain "solr/" (empty collection name) after core reload, but I believe I fixed that problem in my patch.

          Show
          Alexey Serba added a comment - However, regardless of what name you specify in the solr.xml, the default core's name is always an empty string. Is this intentional? Yes, I guess this is part of default core name normalization. I've noticed the same problem with JMX domain "solr/" (empty collection name) after core reload, but I believe I fixed that problem in my patch.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 1145518 on trunk and 1145527 on branch_3x

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 1145518 on trunk and 1145527 on branch_3x
          Hide
          Shalin Shekhar Mangar added a comment -

          Patch for branch 3x

          Show
          Shalin Shekhar Mangar added a comment - Patch for branch 3x
          Hide
          Shalin Shekhar Mangar added a comment -

          Patch updated to trunk.

          I was mistaken about the core name changing on reload.

          I'll commit this shortly.

          However, regardless of what name you specify in the solr.xml, the default core's name is always an empty string. Is this intentional?

          Show
          Shalin Shekhar Mangar added a comment - Patch updated to trunk. I was mistaken about the core name changing on reload. I'll commit this shortly. However, regardless of what name you specify in the solr.xml, the default core's name is always an empty string. Is this intentional?
          Hide
          Shalin Shekhar Mangar added a comment -

          Here's a patch which fixes the issue. I've reused Alexey's tests with the solution I proposed earlier.

          The problem with the core name changing across reloads is something we can address in another issue.

          Show
          Shalin Shekhar Mangar added a comment - Here's a patch which fixes the issue. I've reused Alexey's tests with the solution I proposed earlier. The problem with the core name changing across reloads is something we can address in another issue.
          Hide
          Shalin Shekhar Mangar added a comment -

          There's another bug with core reload that I found while running Alexey's test. Suppose there's only one core with name "X" and you reload "X", it then becomes registered with "" as the core name. So all your jmx monitoring is now useless because the key names have changed.

          Show
          Shalin Shekhar Mangar added a comment - There's another bug with core reload that I found while running Alexey's test. Suppose there's only one core with name "X" and you reload "X", it then becomes registered with "" as the core name. So all your jmx monitoring is now useless because the key names have changed.
          Hide
          Hoss Man added a comment -

          Grr... right, right. ObjectInstance != MBean.

          Show
          Hoss Man added a comment - Grr... right, right. ObjectInstance != MBean.
          Hide
          Shalin Shekhar Mangar added a comment -

          Hoss, I wish there was a way to do just that. I looked and looked but couldn't find it. The JMX API is really screwed up. Once you send in a MBean, apparently you can't get it out again. I'd be interested if anyone knew of a way to do that.

          Show
          Shalin Shekhar Mangar added a comment - Hoss, I wish there was a way to do just that. I looked and looked but couldn't find it. The JMX API is really screwed up. Once you send in a MBean, apparently you can't get it out again. I'd be interested if anyone knew of a way to do that.
          Hide
          Hoss Man added a comment -

          Alexey: at first glance, i think i would prefer Shalin's suggestion over your patch.

          My main hesitation about your approach is the parameterized close method – If we really go that route i'd much rather see something like a "SolrCore.preCloseToReleaesResources()" method. But more fundementally, if we unregister the MBeans before creating the new core, there is a window of time when the old core is responding to requests, but can't be monitored (and if anything goes wrong with creating the new core, the old one will continue to handle requests indefinitely but be totally unmonitorable.

          That said: i suspect the fix might even be easier then what Shalin proposed (which would require making SolrCore passing itself into the JmxMonitoredMap) ... can't we essentially change JmxMonitoredMap.unregsiter(String,SolrInfoMBean) to have psuedo code like this..

          if (server.isRegistered(name)) {
            MBean existing = server.getMBean(name)
            if (existing intsanceof SolrDynamicMBean && 
                existing.getSolrInfoMBean() == this.get(name)) {
              server.unregisterMBean(name);
            } else {
               // :NOOP: MBean is not ours
            }
          }
          

          ...adding a package protected SolrDynamicMBean.getSolrInfoMBean() seems less invasive then passing the SolrCore to another class

          Show
          Hoss Man added a comment - Alexey: at first glance, i think i would prefer Shalin's suggestion over your patch. My main hesitation about your approach is the parameterized close method – If we really go that route i'd much rather see something like a "SolrCore.preCloseToReleaesResources()" method. But more fundementally, if we unregister the MBeans before creating the new core, there is a window of time when the old core is responding to requests, but can't be monitored (and if anything goes wrong with creating the new core, the old one will continue to handle requests indefinitely but be totally unmonitorable. That said: i suspect the fix might even be easier then what Shalin proposed (which would require making SolrCore passing itself into the JmxMonitoredMap) ... can't we essentially change JmxMonitoredMap.unregsiter(String,SolrInfoMBean) to have psuedo code like this.. if (server.isRegistered(name)) { MBean existing = server.getMBean(name) if (existing intsanceof SolrDynamicMBean && existing.getSolrInfoMBean() == this .get(name)) { server.unregisterMBean(name); } else { // :NOOP: MBean is not ours } } ...adding a package protected SolrDynamicMBean.getSolrInfoMBean() seems less invasive then passing the SolrCore to another class
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks for reporting this Alexey. I think the right way to fix it would be to modify the JmxMonitoredMap. Right now, the unregister method checks if a given key is registered with the server and if so, unregisters it. On a core reload, the key is same but the bean instance is different so all keys are unregistered.

          We can modify the SolrDynamicMBean and put the parent core's hashCode as an extra attribute. Then in the unregister method, remove the mbean from the server after checking if the mbean's hashCode attribute has the same value.

          Show
          Shalin Shekhar Mangar added a comment - Thanks for reporting this Alexey. I think the right way to fix it would be to modify the JmxMonitoredMap. Right now, the unregister method checks if a given key is registered with the server and if so, unregisters it. On a core reload, the key is same but the bean instance is different so all keys are unregistered. We can modify the SolrDynamicMBean and put the parent core's hashCode as an extra attribute. Then in the unregister method, remove the mbean from the server after checking if the mbean's hashCode attribute has the same value.
          Hide
          Alexey Serba added a comment -

          Test + fix

          Show
          Alexey Serba added a comment - Test + fix
          Hide
          Alexey Serba added a comment -

          Added test

          Show
          Alexey Serba added a comment - Added test
          Show
          Alexey Serba added a comment - Related bug report in solr mailing list - http://www.lucidimagination.com/search/document/f109d695b7e5d2ae/weird_issue_with_solr_and_jconsole_jmx

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Alexey Serba
            • Votes:
              4 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development