Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10054

Core swapping doesn't work with new metrics changes in place

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.4, 7.0
    • Fix Version/s: 6.4.1, 7.0
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      The new 6.4.0 version includes some significant changes having to do with metrics. These changes have broken core swapping. Will attach some screenshots.

      For the screenshots that I will attach, I started Solr directly from the 6.4.0 download on Windows 7 (bin\solr start). Then I created a "foo" core and a "bar" core, each from a different configset, using the bin\solr command.

      • Screenshot 1: you can see the two cores in CoreAdmin.
      • Screenshot 2: Attempting to swap the cores, an error message appears about a metric already existing for the ping handler.
      • Screenshot 3: Clicking somewhere else and then back to CoreAdmin shows that both cores have the same name – bar.
      • If Solr is stopped and then started back up, the admin UI looks like screenshot 1 again – the change that caused two cores with the same name only took place within the running Solr and did not update core.properties files.
      1. SOLR-10054.patch
        19 kB
        Andrzej Bialecki
      2. SOLR-10054.patch
        18 kB
        Andrzej Bialecki
      3. solr64coreswap1.png
        107 kB
        Shawn Heisey
      4. solr64coreswap2.png
        112 kB
        Shawn Heisey
      5. solr64coreswap3.png
        104 kB
        Shawn Heisey

        Activity

        Hide
        elyograg Shawn Heisey added a comment -

        Attaching screenshots.

        Show
        elyograg Shawn Heisey added a comment - Attaching screenshots.
        Hide
        elyograg Shawn Heisey added a comment -

        The initial problem report came in via the #solr IRC channel.

        Show
        elyograg Shawn Heisey added a comment - The initial problem report came in via the #solr IRC channel.
        Hide
        elyograg Shawn Heisey added a comment - - edited

        The error message visible in the admin UI is "A metric named ADMIN./admin/ping.requestTimes already exists"

        Below is the full ERROR entry from the logfile when the core swap is attempted. This is from the binary 6.4.0 release:

        2017-01-30 22:18:20.746 ERROR (qtp1769597131-22) [   ] o.a.s.h.RequestHandlerBase java.lang.IllegalArgumentException: A metric named ADMIN./admin/ping.requestTimes already exists
        	at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
        	at com.codahale.metrics.MetricRegistry.registerAll(MetricRegistry.java:389)
        	at com.codahale.metrics.MetricRegistry.registerAll(MetricRegistry.java:104)
        	at org.apache.solr.metrics.SolrMetricManager.moveMetrics(SolrMetricManager.java:227)
        	at org.apache.solr.metrics.SolrCoreMetricManager.afterCoreSetName(SolrCoreMetricManager.java:76)
        	at org.apache.solr.core.SolrCore.setName(SolrCore.java:423)
        	at org.apache.solr.core.SolrCores.swap(SolrCores.java:243)
        	at org.apache.solr.core.CoreContainer.swap(CoreContainer.java:1012)
        	at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$3(CoreAdminOperation.java:122)
        	at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:377)
        	at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:379)
        	at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:165)
        	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
        	at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664)
        	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:445)
        	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
        	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
        	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
        	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
        	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
        	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
        	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
        	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
        	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
        	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
        	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
        	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
        	at org.eclipse.jetty.server.Server.handle(Server.java:534)
        	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
        	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
        	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
        	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
        	at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
        	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
        	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
        	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
        	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
        	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
        	at java.lang.Thread.run(Thread.java:745)
        
        Show
        elyograg Shawn Heisey added a comment - - edited The error message visible in the admin UI is "A metric named ADMIN./admin/ping.requestTimes already exists" Below is the full ERROR entry from the logfile when the core swap is attempted. This is from the binary 6.4.0 release: 2017-01-30 22:18:20.746 ERROR (qtp1769597131-22) [ ] o.a.s.h.RequestHandlerBase java.lang.IllegalArgumentException: A metric named ADMIN./admin/ping.requestTimes already exists at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91) at com.codahale.metrics.MetricRegistry.registerAll(MetricRegistry.java:389) at com.codahale.metrics.MetricRegistry.registerAll(MetricRegistry.java:104) at org.apache.solr.metrics.SolrMetricManager.moveMetrics(SolrMetricManager.java:227) at org.apache.solr.metrics.SolrCoreMetricManager.afterCoreSetName(SolrCoreMetricManager.java:76) at org.apache.solr.core.SolrCore.setName(SolrCore.java:423) at org.apache.solr.core.SolrCores.swap(SolrCores.java:243) at org.apache.solr.core.CoreContainer.swap(CoreContainer.java:1012) at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$3(CoreAdminOperation.java:122) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:377) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:379) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:165) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:445) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:745)
        Hide
        elyograg Shawn Heisey added a comment -

        I see that there is "TestCoreAdmin" but the text "swap" does not appear in that test. I cannot find any evidence that core swapping is tested by any Solr tests at all.

        Show
        elyograg Shawn Heisey added a comment - I see that there is "TestCoreAdmin" but the text "swap" does not appear in that test. I cannot find any evidence that core swapping is tested by any Solr tests at all.
        Hide
        ab Andrzej Bialecki added a comment -

        Yeah, we definitely need more test coverage of such fundamental functionality...

        The issue turned out to be less trivial than I thought. When rename or swap core is requested the metrics subsystem needs to preserve metrics that are already accumulated under the old core name (we use separate registries per core). Since metrics' initialization occurs when core is constructed we can't easily re-register all SolrMetricProducer-s in the new registry, so the existing code tried to move actual Metric instances from the old to the new repository. The problem was that more or less the same metric names already existed in the target repository, because they were registered there by the other core's producers - and vice versa.

        The solution was to implement a dedicated operation SolrMetricManager.swap(name1, name2) that knows how to atomically (or rather under proper locking) swap these two registries, without moving metric instances between registries.

        I also added TestCoreAdmin.testCoreSwap and testValidCoreRename, and extended CoreAdminRequest to support the swap operation.

        Show
        ab Andrzej Bialecki added a comment - Yeah, we definitely need more test coverage of such fundamental functionality... The issue turned out to be less trivial than I thought. When rename or swap core is requested the metrics subsystem needs to preserve metrics that are already accumulated under the old core name (we use separate registries per core). Since metrics' initialization occurs when core is constructed we can't easily re-register all SolrMetricProducer -s in the new registry, so the existing code tried to move actual Metric instances from the old to the new repository. The problem was that more or less the same metric names already existed in the target repository, because they were registered there by the other core's producers - and vice versa. The solution was to implement a dedicated operation SolrMetricManager.swap(name1, name2) that knows how to atomically (or rather under proper locking) swap these two registries, without moving metric instances between registries. I also added TestCoreAdmin.testCoreSwap and testValidCoreRename , and extended CoreAdminRequest to support the swap operation.
        Hide
        ab Andrzej Bialecki added a comment -

        Patch relative to branch_6x.

        Show
        ab Andrzej Bialecki added a comment - Patch relative to branch_6x.
        Hide
        ab Andrzej Bialecki added a comment -

        Updated patch. All tests are passing, I think this is ready.

        Show
        ab Andrzej Bialecki added a comment - Updated patch. All tests are passing, I think this is ready.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 8299378eab3282e4dcb14b92645a4f1d214f13cc in lucene-solr's branch refs/heads/branch_6x from Andrzej Bialecki
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8299378 ]

        SOLR-10054: Core swapping doesn't work with new metrics changes in place.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 8299378eab3282e4dcb14b92645a4f1d214f13cc in lucene-solr's branch refs/heads/branch_6x from Andrzej Bialecki [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8299378 ] SOLR-10054 : Core swapping doesn't work with new metrics changes in place.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit bef725aeefea0ba34bdf9c74b8e67376377e8983 in lucene-solr's branch refs/heads/master from Andrzej Bialecki
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bef725a ]

        SOLR-10054: Core swapping doesn't work with new metrics changes in place.

        Show
        jira-bot ASF subversion and git services added a comment - Commit bef725aeefea0ba34bdf9c74b8e67376377e8983 in lucene-solr's branch refs/heads/master from Andrzej Bialecki [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bef725a ] SOLR-10054 : Core swapping doesn't work with new metrics changes in place.

          People

          • Assignee:
            ab Andrzej Bialecki
            Reporter:
            elyograg Shawn Heisey
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development