Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-15227

Single host in a bad state can block collection creation for the cluster with autoscaling enabled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 8.2
    • None
    • AutoScaling
    • None

    Description

      We configured a SolrCloud cluster (running 8.2) with this cluster autoscaling policy:

      {
        "set-cluster-preferences":[
          {
            "minimize":"cores",
            "precision":5
          },
          {
            "maximize":"freedisk",
            "precision":25
          },
          {
            "minimize":"sysLoadAvg",
            "precision":10
          }],
        "set-cluster-policy":[
          {
            "replica": "<2",
            "node": "#ANY"
          }],
        "set-trigger": {
          "name":".auto_add_replicas",
          "event":"nodeLost",
          "waitFor":"10m",
          "enabled":true,
          "actions":[
            {
              "name":"auto_add_replicas_plan",
              "class":"solr.AutoAddReplicasPlanAction"},
            {
              "name":"execute_plan",
              "class":"solr.ExecutePlanAction"}]
        }
      }

      A node was rebooted at one point, and when that node came back, it had trouble establishing a connection with ZK when it was initializing the CoreContainer. As a result, it returns 404s for (I think?) all admin requests.

      Now, any call to create a collection in that cluster throw an error, with this stacktrace:

      2021-03-04 12:47:03.615 ERROR (OverseerThreadFactory-141-thread-4-processing-n:HOST_REDACTED:8983_solr) [   ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: COLLECTON_REDACTED operation: create failed:org.apache.solr.common.SolrException: Error getting replica locations : unable to get autoscaling policy session
          at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:195)
          at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
          at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
          at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at java.base/java.lang.Thread.run(Thread.java:834)
      Caused by: org.apache.solr.common.SolrException: unable to get autoscaling policy session
          at org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getReplicaLocations(PolicyHelper.java:129)
          at org.apache.solr.cloud.api.collections.Assign.getPositionsUsingPolicy(Assign.java:382)
          at org.apache.solr.cloud.api.collections.Assign$PolicyBasedAssignStrategy.assign(Assign.java:630)
          at org.apache.solr.cloud.api.collections.CreateCollectionCmd.buildReplicaPositions(CreateCollectionCmd.java:410)
          at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:190)
          ... 6 more
      Caused by: org.apache.solr.common.SolrException: org.apache.solr.common.SolrException: Error getting remote info
          at org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:78)
          at org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchTagValues(SolrClientNodeStateProvider.java:139)
          at org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.getNodeValues(SolrClientNodeStateProvider.java:128)
          at org.apache.solr.client.solrj.cloud.autoscaling.Row.<init>(Row.java:71)
          at org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.<init>(Policy.java:575)
          at org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:396)
          at org.apache.solr.client.solrj.cloud.autoscaling.Policy.createSession(Policy.java:358)
          at org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.createSession(PolicyHelper.java:492)
          at org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.get(PolicyHelper.java:457)
          at org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSession(PolicyHelper.java:513)
          at org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getReplicaLocations(PolicyHelper.java:127)
          ... 10 more
      Caused by: org.apache.solr.common.SolrException: Error getting remote info
          at org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$AutoScalingSnitch.getRemoteInfo(SolrClientNodeStateProvider.java:364)
          at org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:76)
          ... 20 more
      Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://HOSTNAME_REDACTED:8983/solr: Expected mime type application/octet-stream but got text/html. <html>
      <head>
      <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
      <title>Error 404 Not Found</title>
      </head>
      <body><h2>HTTP ERROR 404</h2>
      <p>Problem accessing /solr/admin/metrics. Reason:
      <pre>    Not Found</pre></p><h3>Caused by:</h3><pre>javax.servlet.ServletException: javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down.
          at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:168)
          at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
          at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
          at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
          at org.eclipse.jetty.server.Server.handle(Server.java:505)
          at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
          at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
          at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
          at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
          at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:427)
          at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:321)
          at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:159)
          at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
          at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
          at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
          at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:781)
          at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:917)
          at java.base/java.lang.Thread.run(Thread.java:834)
      Caused by: javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down.
          at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:369)
          at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)
          at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
          at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
          at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
          at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
          at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
          at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
          at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
          at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
          at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
          at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
          at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
          at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
          at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
          at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
          at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
          at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
          at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
          ... 21 more
      </pre>
      <h3>Caused by:</h3><pre>javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down.
          at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:369)
          at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)
          at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
          at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
          at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
          at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
          at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
          at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
          at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
          at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
          at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
          at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
          at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
          at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
          at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
          at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
          at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
          at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
          at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
          at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
          at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
          at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
          at org.eclipse.jetty.server.Server.handle(Server.java:505)
          at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
          at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
          at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
          at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
          at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:427)
          at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:321)
          at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:159)
          at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
          at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
          at org.eclipse.jetty....

      I looked through the Solr code and to me it looks like:

       

      I realize this autoscaling code is gone in 9.x, but at least wanted to report this issue for documentation purposes, in case others see this.

      Attachments

        Activity

          People

            Unassigned Unassigned
            athrog Andy Throgmorton
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: