Solr
  1. Solr
  2. SOLR-5601

NPE in OverseerCollectionProcessor.migrateKey - inconsistent failure from MigrateRouteKeyTest

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.7
    • Fix Version/s: 4.7, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      Encountered a weird semi-reproducible failure from MigrateRouteKeyTest....

      ant test  -Dtestcase=MigrateRouteKeyTest -Dtests.method=testDistribSearch -Dtests.seed=26F9FF92782251B6 -Dtests.slow=true -Dtests.locale=sr_CS -Dtests.timezone=Europe/Zaporozhye -Dtests.file.encoding=UTF-8
      

      Attempting to reproduce with that line caused failures the first 2 times i tried, and then succeeded every time after that.

      Details of failure to follow in comment.

      (NOTE: this may be related to some recent jenkins failures due to thread leaks from this test)

      1. SOLR-5601.patch
        1 kB
        Shalin Shekhar Mangar

        Activity

        Hide
        Hoss Man added a comment -

        The initial start of trouble seened to me an NPE in OverseerCollectionProcessor.migrateKey...

           [junit4]   2> 324502 T1186 oasc.ZkController.checkRecovery I am the leader, no recovery necessary
           [junit4]   2> 324502 T1186 oasc.ZkController.publish publishing core=split_shard2_temp_shard1_shard1_replica1 state=active
           [junit4]   2> 324514 T1228 C220 P42114 oasup.LogUpdateProcessor.finish [collection1] webapp=/tr_/xq path=/update params={distrib.from=http://127.0.0.1:47570/tr_/xq/collection1/&version=2&wt=javabin&update.distrib=FROMLEADER} {add=[m/1!116 (1456162963872284672)]} 0 0
           [junit4]   2> 324515 T1189 C221 P47570 oasup.LogUpdateProcessor.finish [collection1] webapp=/tr_/xq path=/update params={version=2&wt=javabin} {add=[m/1!116 (1456162963872284672)]} 0 3
           [junit4]   2> 324526 T1186 oascc.ZkStateReader.updateClusterState Updating cloud state from ZooKeeper... 
           [junit4]   2> 324527 T1186 oasc.SolrXMLCoresLocator.doPersist Persisted core descriptions to /home/hossman/lucene/dev/solr/build/solr-core/test/J2/../../../../../../../../../home/hossman/lucene/dev/solr/build/solr-core/test/J2/./org.apache.solr.cloud.MigrateRouteKeyTest-jetty1-1388705195623/solr.xml
           [junit4]   2> 324527 T1186 oass.SolrDispatchFilter.handleAdminRequest [admin] webapp=null path=/admin/cores params={numShards=1&collection.configName=conf1&version=2&wt=javabin&name=split_shard2_temp_shard1_shard1_replica1&shard=shard1&collection=split_shard2_temp_shard1&qt=/admin/cores&action=CREATE} status=0 QTime=1656 
           [junit4]   2> 324528 T1174 oasc.OverseerCollectionProcessor.createCollection Finished create command on all shards for collection: split_shard2_temp_shard1
           [junit4]   2> 324528 T1174 oasc.OverseerCollectionProcessor.migrateKey Asking source leader to wait for: split_shard2_temp_shard1_shard1_replica1 to be alive on: 127.0.0.1:47570_tr_%2Fxq
           [junit4]   2> 324528 T1174 oasc.SolrException.log ERROR Collection migrate of migrate failed:java.lang.NullPointerException
           [junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor.migrateKey(OverseerCollectionProcessor.java:1208)
           [junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor.migrate(OverseerCollectionProcessor.java:1102)
           [junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor.processMessage(OverseerCollectionProcessor.java:252)
           [junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor.run(OverseerCollectionProcessor.java:176)
           [junit4]   2> 		at java.lang.Thread.run(Thread.java:724)
           [junit4]   2> 	
           [junit4]   2> 324551 T1217 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work/qnr-0000000002 state: SyncConnected type NodeDataChanged
           [junit4]   2> 324552 T1172 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
           [junit4]   2> 324552 T1172 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
           [junit4]   2> 324559 T1172 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged
           [junit4]   2> 324559 T1174 oasc.OverseerCollectionProcessor.run Overseer Collection Processor: Message id:/overseer/collection-queue-work/qn-0000000002 complete, response:{success={null={responseHeader={status=0,QTime=0},core=migrate_multipleshardtest_targetCollection_shard1_replica1,status=BUFFERING},null={responseHeader={status=0,QTime=1656},core=split_shard2_temp_shard1_shard1_replica1}},Operation migrate caused exception:=java.lang.NullPointerException,exception={msg=null,rspCode=-1}}
           [junit4]   2> 324568 T1173 oascc.ZkStateReader.updateClusterState Updating cloud state from ZooKeeper... 
           [junit4]   2> 324569 T1173 oasc.Overseer$ClusterStateUpdater.updateState Update state numShards=1 message={
           [junit4]   2> 	  "operation":"state",
           [junit4]   2> 	  "state":"active",
           [junit4]   2> 	  "base_url":"http://127.0.0.1:47570/tr_/xq",
           [junit4]   2> 	  "core":"split_shard2_temp_shard1_shard1_replica1",
           [junit4]   2> 	  "roles":null,
           [junit4]   2> 	  "node_name":"127.0.0.1:47570_tr_%2Fxq",
           [junit4]   2> 	  "shard":"shard1",
           [junit4]   2> 	  "shard_range":null,
           [junit4]   2> 	  "shard_state":"active",
           [junit4]   2> 	  "shard_parent":null,
           [junit4]   2> 	  "collection":"split_shard2_temp_shard1",
           [junit4]   2> 	  "numShards":"1",
           [junit4]   2> 	  "core_node_name":"core_node1"}
           [junit4]   2> 324570 T1209 oass.SolrDispatchFilter.handleAdminRequest [admin] webapp=null path=/admin/collections params={split.key=a/1!&version=2&wt=javabin&forward.timeout=45&collection=collection1&target.collection=migrate_multipleshardtest_targetCollection&action=MIGRATE} status=500 QTime=2211 
           [junit4]   2> 324571 T1209 oasc.SolrException.log ERROR null:org.apache.solr.common.SolrException
           [junit4]   2> 		at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:204)
           [junit4]   2> 		at org.apache.solr.handler.admin.CollectionsHandler.handleMigrate(CollectionsHandler.java:450)
           [junit4]   2> 		at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:174)
           [junit4]   2> 		at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
           [junit4]   2> 		at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:705)
           [junit4]   2> 		at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
           [junit4]   2> 		at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:200)
           [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
           [junit4]   2> 		at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:136)
           [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
           [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
           [junit4]   2> 		at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
           [junit4]   2> 		at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
           [junit4]   2> 		at org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301)
           [junit4]   2> 		at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077)
           [junit4]   2> 		at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
           [junit4]   2> 		at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
           [junit4]   2> 		at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
           [junit4]   2> 		at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
           [junit4]   2> 		at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
           [junit4]   2> 		at org.eclipse.jetty.server.Server.handle(Server.java:368)
           [junit4]   2> 		at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
           [junit4]   2> 		at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
           [junit4]   2> 		at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
           [junit4]   2> 		at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
           [junit4]   2> 		at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
           [junit4]   2> 		at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
           [junit4]   2> 		at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
           [junit4]   2> 		at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
           [junit4]   2> 		at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
           [junit4]   2> 		at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
           [junit4]   2> 		at java.lang.Thread.run(Thread.java:724)
           [junit4]   2> 	
           [junit4]   2> 324572 T1249 C222 P46266 oasup.LogUpdateProcessor.finish [collection1] webapp=/tr_/xq path=/update params={distrib.from=http://127.0.0.1:52363/tr_/xq/collection1/&version=2&wt=javabin&update.distrib=FROMLEADER} {add=[n/1!117 (1456162963931004928)]} 0 0
           [junit4]   2> 324573 T1207 C223 P52363 oasup.LogUpdateProcessor.finish [collection1] webapp=/tr_/xq path=/update params={version=2&wt=javabin} {add=[n/1!117 (1456162963931004928)]} 0 5
           [junit4]   2> 324585 T1146 oasc.ChaosMonkey.monkeyLog monkey: stop shard! 36682
           [junit4]   2> 324585 T1146 oasc.CoreContainer.shutdown Shutting down CoreContainer instance=2033417453
           [junit4]   2> 324586 T1146 oasc.ZkController.publish publishing core=collection1 state=down
           [junit4]   2> 324586 T1146 oasc.ZkController.publish numShards not found on descriptor - reading it from system property
           [junit4]   2> 324617 T1172 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
        

        ...which then cascaded into an "IllegalStateException: Connection pool shut down"...

           [junit4]   2> 324711 T1281 oascc.ZkStateReader.createClusterStateWatchersAndUpdate Updating cluster state from ZooKeeper... 
           [junit4]   2> 324718 T1197 oascc.SolrZkClient.makePath makePath: /overseer_elect/leader
           [junit4]   2> 324722 T1146 oejsh.ContextHandler.doStop stopped o.e.j.s.ServletContextHandler{/tr_/xq,null}
           [junit4]   2> 324726 T1197 oasc.Overseer.start Overseer (id=91010183481589766-127.0.0.1:47570_tr_%2Fxq-n_0000000001) starting
           [junit4]   2> 324727 T1281 oasc.MigrateRouteKeyTest$Indexer.run ERROR Exception while adding document id: id=o/1!118 org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: java.lang.IllegalStateException: Connection pool shut down
           [junit4]   2> 	at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:361)
           [junit4]   2> 	at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:510)
           [junit4]   2> 	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
           [junit4]   2> 	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
           [junit4]   2> 	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
           [junit4]   2> 	at org.apache.solr.cloud.MigrateRouteKeyTest$Indexer.run(MigrateRouteKeyTest.java:252)
           [junit4]   2> Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.IllegalStateException: Connection pool shut down
           [junit4]   2> 	at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:310)
           [junit4]   2> 	at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:358)
           [junit4]   2> 	... 5 more
           [junit4]   2> Caused by: java.lang.IllegalStateException: Connection pool shut down
           [junit4]   2> 	at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:167)
           [junit4]   2> 	at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:199)
           [junit4]   2> 	at org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:188)
           [junit4]   2> 	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:448)
           [junit4]   2> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
           [junit4]   2> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
           [junit4]   2> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
           [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:395)
           [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
           [junit4]   2> 	at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:283)
           [junit4]   2> 	... 6 more
           [junit4]   2> 
        

        ...lots more "IllegalStateException: Connection pool shut down" for various doc add attempts, ultimatly leading to...

           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=MigrateRouteKeyTest -Dtests.method=testDistribSearch -Dtests.seed=26F9FF92782251B6 -Dtests.slow=true -Dtests.locale=sr_CS -Dtests.timezone=Europe/Zaporozhye -Dtests.file.encoding=UTF-8
           [junit4] ERROR   28.8s J2 | MigrateRouteKeyTest.testDistribSearch <<<
           [junit4]    > Throwable #1: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server Error
           [junit4]    > request: http://127.0.0.1:52363/tr_/xq/admin/collections?action=MIGRATE&collection=collection1&target.collection=migrate_multipleshardtest_targetCollection&split.key=a/1!&forward.timeout=45&wt=javabin&version=2
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([26F9FF92782251B6:A71F718A0F7D318A]:0)
           [junit4]    > 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
           [junit4]    > 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
           [junit4]    > 	at org.apache.solr.cloud.MigrateRouteKeyTest.invoke(MigrateRouteKeyTest.java:139)
           [junit4]    > 	at org.apache.solr.cloud.MigrateRouteKeyTest.invokeMigrateApi(MigrateRouteKeyTest.java:125)
           [junit4]    > 	at org.apache.solr.cloud.MigrateRouteKeyTest.multipleShardMigrateTest(MigrateRouteKeyTest.java:198)
           [junit4]    > 	at org.apache.solr.cloud.MigrateRouteKeyTest.doTest(MigrateRouteKeyTest.java:93)
           [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:867)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:724)
        

        ...and then...

           [junit4]   2> 50947 T1145 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 4 threads leaked from SUITE scope at org.apache.solr.cloud.MigrateRouteKeyTest: 
           [junit4]   2> 	   1) Thread[id=1298, name=Timer-98, state=TIMED_WAITING, group=TGRP-MigrateRouteKeyTest]
           [junit4]   2> 	        at java.lang.Object.wait(Native Method)
           [junit4]   2> 	        at java.util.TimerThread.mainLoop(Timer.java:552)
           [junit4]   2> 	        at java.util.TimerThread.run(Timer.java:505)
           [junit4]   2> 	   2) Thread[id=1299, name=Thread-532-SendThread(localhost:46743), state=TIMED_WAITING, group=TGRP-MigrateRouteKeyTest]
           [junit4]   2> 	        at java.lang.Thread.sleep(Native Method)
           [junit4]   2> 	        at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86)
           [junit4]   2> 	        at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:937)
           [junit4]   2> 	        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993)
           [junit4]   2> 	   3) Thread[id=1300, name=Thread-532-EventThread, state=WAITING, group=TGRP-MigrateRouteKeyTest]
           [junit4]   2> 	        at sun.misc.Unsafe.park(Native Method)
           [junit4]   2> 	        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
           [junit4]   2> 	        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
           [junit4]   2> 	        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
           [junit4]   2> 	        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491)
           [junit4]   2> 	   4) Thread[id=1281, name=Thread-532, state=TIMED_WAITING, group=TGRP-MigrateRouteKeyTest]
           [junit4]   2> 	        at java.lang.Object.wait(Native Method)
           [junit4]   2> 	        at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:204)
           [junit4]   2> 	        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:139)
           [junit4]   2> 	        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:98)
           [junit4]   2> 	        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:88)
           [junit4]   2> 	        at org.apache.solr.common.cloud.ZkStateReader.<init>(ZkStateReader.java:183)
           [junit4]   2> 	        at org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:240)
           [junit4]   2> 	        at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:501)
           [junit4]   2> 	        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
           [junit4]   2> 	        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
           [junit4]   2> 	        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
           [junit4]   2> 	        at org.apache.solr.cloud.MigrateRouteKeyTest$Indexer.run(MigrateRouteKeyTest.java:252)
        
        Show
        Hoss Man added a comment - The initial start of trouble seened to me an NPE in OverseerCollectionProcessor.migrateKey... [junit4] 2> 324502 T1186 oasc.ZkController.checkRecovery I am the leader, no recovery necessary [junit4] 2> 324502 T1186 oasc.ZkController.publish publishing core=split_shard2_temp_shard1_shard1_replica1 state=active [junit4] 2> 324514 T1228 C220 P42114 oasup.LogUpdateProcessor.finish [collection1] webapp=/tr_/xq path=/update params={distrib.from=http://127.0.0.1:47570/tr_/xq/collection1/&version=2&wt=javabin&update.distrib=FROMLEADER} {add=[m/1!116 (1456162963872284672)]} 0 0 [junit4] 2> 324515 T1189 C221 P47570 oasup.LogUpdateProcessor.finish [collection1] webapp=/tr_/xq path=/update params={version=2&wt=javabin} {add=[m/1!116 (1456162963872284672)]} 0 3 [junit4] 2> 324526 T1186 oascc.ZkStateReader.updateClusterState Updating cloud state from ZooKeeper... [junit4] 2> 324527 T1186 oasc.SolrXMLCoresLocator.doPersist Persisted core descriptions to /home/hossman/lucene/dev/solr/build/solr-core/test/J2/../../../../../../../../../home/hossman/lucene/dev/solr/build/solr-core/test/J2/./org.apache.solr.cloud.MigrateRouteKeyTest-jetty1-1388705195623/solr.xml [junit4] 2> 324527 T1186 oass.SolrDispatchFilter.handleAdminRequest [admin] webapp=null path=/admin/cores params={numShards=1&collection.configName=conf1&version=2&wt=javabin&name=split_shard2_temp_shard1_shard1_replica1&shard=shard1&collection=split_shard2_temp_shard1&qt=/admin/cores&action=CREATE} status=0 QTime=1656 [junit4] 2> 324528 T1174 oasc.OverseerCollectionProcessor.createCollection Finished create command on all shards for collection: split_shard2_temp_shard1 [junit4] 2> 324528 T1174 oasc.OverseerCollectionProcessor.migrateKey Asking source leader to wait for: split_shard2_temp_shard1_shard1_replica1 to be alive on: 127.0.0.1:47570_tr_%2Fxq [junit4] 2> 324528 T1174 oasc.SolrException.log ERROR Collection migrate of migrate failed:java.lang.NullPointerException [junit4] 2> at org.apache.solr.cloud.OverseerCollectionProcessor.migrateKey(OverseerCollectionProcessor.java:1208) [junit4] 2> at org.apache.solr.cloud.OverseerCollectionProcessor.migrate(OverseerCollectionProcessor.java:1102) [junit4] 2> at org.apache.solr.cloud.OverseerCollectionProcessor.processMessage(OverseerCollectionProcessor.java:252) [junit4] 2> at org.apache.solr.cloud.OverseerCollectionProcessor.run(OverseerCollectionProcessor.java:176) [junit4] 2> at java.lang.Thread.run(Thread.java:724) [junit4] 2> [junit4] 2> 324551 T1217 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work/qnr-0000000002 state: SyncConnected type NodeDataChanged [junit4] 2> 324552 T1172 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged [junit4] 2> 324552 T1172 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged [junit4] 2> 324559 T1172 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged [junit4] 2> 324559 T1174 oasc.OverseerCollectionProcessor.run Overseer Collection Processor: Message id:/overseer/collection-queue-work/qn-0000000002 complete, response:{success={null={responseHeader={status=0,QTime=0},core=migrate_multipleshardtest_targetCollection_shard1_replica1,status=BUFFERING},null={responseHeader={status=0,QTime=1656},core=split_shard2_temp_shard1_shard1_replica1}},Operation migrate caused exception:=java.lang.NullPointerException,exception={msg=null,rspCode=-1}} [junit4] 2> 324568 T1173 oascc.ZkStateReader.updateClusterState Updating cloud state from ZooKeeper... [junit4] 2> 324569 T1173 oasc.Overseer$ClusterStateUpdater.updateState Update state numShards=1 message={ [junit4] 2> "operation":"state", [junit4] 2> "state":"active", [junit4] 2> "base_url":"http://127.0.0.1:47570/tr_/xq", [junit4] 2> "core":"split_shard2_temp_shard1_shard1_replica1", [junit4] 2> "roles":null, [junit4] 2> "node_name":"127.0.0.1:47570_tr_%2Fxq", [junit4] 2> "shard":"shard1", [junit4] 2> "shard_range":null, [junit4] 2> "shard_state":"active", [junit4] 2> "shard_parent":null, [junit4] 2> "collection":"split_shard2_temp_shard1", [junit4] 2> "numShards":"1", [junit4] 2> "core_node_name":"core_node1"} [junit4] 2> 324570 T1209 oass.SolrDispatchFilter.handleAdminRequest [admin] webapp=null path=/admin/collections params={split.key=a/1!&version=2&wt=javabin&forward.timeout=45&collection=collection1&target.collection=migrate_multipleshardtest_targetCollection&action=MIGRATE} status=500 QTime=2211 [junit4] 2> 324571 T1209 oasc.SolrException.log ERROR null:org.apache.solr.common.SolrException [junit4] 2> at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:204) [junit4] 2> at org.apache.solr.handler.admin.CollectionsHandler.handleMigrate(CollectionsHandler.java:450) [junit4] 2> at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:174) [junit4] 2> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:705) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) [junit4] 2> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:200) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) [junit4] 2> at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:136) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) [junit4] 2> at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229) [junit4] 2> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) [junit4] 2> at org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301) [junit4] 2> at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077) [junit4] 2> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) [junit4] 2> at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) [junit4] 2> at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) [junit4] 2> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) [junit4] 2> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) [junit4] 2> at org.eclipse.jetty.server.Server.handle(Server.java:368) [junit4] 2> at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) [junit4] 2> at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) [junit4] 2> at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) [junit4] 2> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) [junit4] 2> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) [junit4] 2> at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) [junit4] 2> at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) [junit4] 2> at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) [junit4] 2> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) [junit4] 2> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) [junit4] 2> at java.lang.Thread.run(Thread.java:724) [junit4] 2> [junit4] 2> 324572 T1249 C222 P46266 oasup.LogUpdateProcessor.finish [collection1] webapp=/tr_/xq path=/update params={distrib.from=http://127.0.0.1:52363/tr_/xq/collection1/&version=2&wt=javabin&update.distrib=FROMLEADER} {add=[n/1!117 (1456162963931004928)]} 0 0 [junit4] 2> 324573 T1207 C223 P52363 oasup.LogUpdateProcessor.finish [collection1] webapp=/tr_/xq path=/update params={version=2&wt=javabin} {add=[n/1!117 (1456162963931004928)]} 0 5 [junit4] 2> 324585 T1146 oasc.ChaosMonkey.monkeyLog monkey: stop shard! 36682 [junit4] 2> 324585 T1146 oasc.CoreContainer.shutdown Shutting down CoreContainer instance=2033417453 [junit4] 2> 324586 T1146 oasc.ZkController.publish publishing core=collection1 state=down [junit4] 2> 324586 T1146 oasc.ZkController.publish numShards not found on descriptor - reading it from system property [junit4] 2> 324617 T1172 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged ...which then cascaded into an "IllegalStateException: Connection pool shut down"... [junit4] 2> 324711 T1281 oascc.ZkStateReader.createClusterStateWatchersAndUpdate Updating cluster state from ZooKeeper... [junit4] 2> 324718 T1197 oascc.SolrZkClient.makePath makePath: /overseer_elect/leader [junit4] 2> 324722 T1146 oejsh.ContextHandler.doStop stopped o.e.j.s.ServletContextHandler{/tr_/xq,null} [junit4] 2> 324726 T1197 oasc.Overseer.start Overseer (id=91010183481589766-127.0.0.1:47570_tr_%2Fxq-n_0000000001) starting [junit4] 2> 324727 T1281 oasc.MigrateRouteKeyTest$Indexer.run ERROR Exception while adding document id: id=o/1!118 org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: java.lang.IllegalStateException: Connection pool shut down [junit4] 2> at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:361) [junit4] 2> at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:510) [junit4] 2> at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) [junit4] 2> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) [junit4] 2> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) [junit4] 2> at org.apache.solr.cloud.MigrateRouteKeyTest$Indexer.run(MigrateRouteKeyTest.java:252) [junit4] 2> Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.IllegalStateException: Connection pool shut down [junit4] 2> at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:310) [junit4] 2> at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:358) [junit4] 2> ... 5 more [junit4] 2> Caused by: java.lang.IllegalStateException: Connection pool shut down [junit4] 2> at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:167) [junit4] 2> at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:199) [junit4] 2> at org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:188) [junit4] 2> at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:448) [junit4] 2> at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) [junit4] 2> at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) [junit4] 2> at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:395) [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) [junit4] 2> at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:283) [junit4] 2> ... 6 more [junit4] 2> ...lots more "IllegalStateException: Connection pool shut down" for various doc add attempts, ultimatly leading to... [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=MigrateRouteKeyTest -Dtests.method=testDistribSearch -Dtests.seed=26F9FF92782251B6 -Dtests.slow=true -Dtests.locale=sr_CS -Dtests.timezone=Europe/Zaporozhye -Dtests.file.encoding=UTF-8 [junit4] ERROR 28.8s J2 | MigrateRouteKeyTest.testDistribSearch <<< [junit4] > Throwable #1: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server Error [junit4] > request: http://127.0.0.1:52363/tr_/xq/admin/collections?action=MIGRATE&collection=collection1&target.collection=migrate_multipleshardtest_targetCollection&split.key=a/1!&forward.timeout=45&wt=javabin&version=2 [junit4] > at __randomizedtesting.SeedInfo.seed([26F9FF92782251B6:A71F718A0F7D318A]:0) [junit4] > at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495) [junit4] > at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) [junit4] > at org.apache.solr.cloud.MigrateRouteKeyTest.invoke(MigrateRouteKeyTest.java:139) [junit4] > at org.apache.solr.cloud.MigrateRouteKeyTest.invokeMigrateApi(MigrateRouteKeyTest.java:125) [junit4] > at org.apache.solr.cloud.MigrateRouteKeyTest.multipleShardMigrateTest(MigrateRouteKeyTest.java:198) [junit4] > at org.apache.solr.cloud.MigrateRouteKeyTest.doTest(MigrateRouteKeyTest.java:93) [junit4] > at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:867) [junit4] > at java.lang.Thread.run(Thread.java:724) ...and then... [junit4] 2> 50947 T1145 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 4 threads leaked from SUITE scope at org.apache.solr.cloud.MigrateRouteKeyTest: [junit4] 2> 1) Thread[id=1298, name=Timer-98, state=TIMED_WAITING, group=TGRP-MigrateRouteKeyTest] [junit4] 2> at java.lang.Object.wait(Native Method) [junit4] 2> at java.util.TimerThread.mainLoop(Timer.java:552) [junit4] 2> at java.util.TimerThread.run(Timer.java:505) [junit4] 2> 2) Thread[id=1299, name=Thread-532-SendThread(localhost:46743), state=TIMED_WAITING, group=TGRP-MigrateRouteKeyTest] [junit4] 2> at java.lang.Thread.sleep(Native Method) [junit4] 2> at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86) [junit4] 2> at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:937) [junit4] 2> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993) [junit4] 2> 3) Thread[id=1300, name=Thread-532-EventThread, state=WAITING, group=TGRP-MigrateRouteKeyTest] [junit4] 2> at sun.misc.Unsafe.park(Native Method) [junit4] 2> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) [junit4] 2> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) [junit4] 2> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) [junit4] 2> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491) [junit4] 2> 4) Thread[id=1281, name=Thread-532, state=TIMED_WAITING, group=TGRP-MigrateRouteKeyTest] [junit4] 2> at java.lang.Object.wait(Native Method) [junit4] 2> at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:204) [junit4] 2> at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:139) [junit4] 2> at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:98) [junit4] 2> at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:88) [junit4] 2> at org.apache.solr.common.cloud.ZkStateReader.<init>(ZkStateReader.java:183) [junit4] 2> at org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:240) [junit4] 2> at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:501) [junit4] 2> at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) [junit4] 2> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) [junit4] 2> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) [junit4] 2> at org.apache.solr.cloud.MigrateRouteKeyTest$Indexer.run(MigrateRouteKeyTest.java:252)
        Hide
        Anshum Gupta added a comment -

        I tried running the test 50 times over but didn't yet run into it.

        Looks like it had issues at one of the 2 spots considering that it never seemed to have asked the source leader to split the index:

        1. While constructing the WaitForState call i.e. waiting for the replicas to be seen as active on temp source leader (More likely as I couldn't spot any logs from there)
        2. During the above mentioned WaitForState call.
        Show
        Anshum Gupta added a comment - I tried running the test 50 times over but didn't yet run into it. Looks like it had issues at one of the 2 spots considering that it never seemed to have asked the source leader to split the index: While constructing the WaitForState call i.e. waiting for the replicas to be seen as active on temp source leader (More likely as I couldn't spot any logs from there) During the above mentioned WaitForState call.
        Hide
        Shalin Shekhar Mangar added a comment -

        This should fix it. It was possible that the temp collection had been created (since we wait for it) but the leader hadn't been elected yet causing the NPE. We should use zkStateReader.getLeaderRetry for this.

        Show
        Shalin Shekhar Mangar added a comment - This should fix it. It was possible that the temp collection had been created (since we wait for it) but the leader hadn't been elected yet causing the NPE. We should use zkStateReader.getLeaderRetry for this.
        Hide
        ASF subversion and git services added a comment -

        Commit 1555083 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1555083 ]

        SOLR-5601: Use zkStateReader.getLeaderRetry to find the leader of a newly created temp collection

        Show
        ASF subversion and git services added a comment - Commit 1555083 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1555083 ] SOLR-5601 : Use zkStateReader.getLeaderRetry to find the leader of a newly created temp collection
        Hide
        ASF subversion and git services added a comment -

        Commit 1555084 from shalin@apache.org in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1555084 ]

        SOLR-5601: Use zkStateReader.getLeaderRetry to find the leader of a newly created temp collection

        Show
        ASF subversion and git services added a comment - Commit 1555084 from shalin@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1555084 ] SOLR-5601 : Use zkStateReader.getLeaderRetry to find the leader of a newly created temp collection
        Hide
        Shalin Shekhar Mangar added a comment -

        I haven't seen this issue again on jenkins and couldn't reproduce this after the fix on my ec2 test box.

        Thanks Hoss and Anshum!

        Show
        Shalin Shekhar Mangar added a comment - I haven't seen this issue again on jenkins and couldn't reproduce this after the fix on my ec2 test box. Thanks Hoss and Anshum!

          People

          • Assignee:
            Shalin Shekhar Mangar
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development