Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11457

HealthCheckHandlerTest.testHealthCheckHandlerSolrJ() non-reproducible failures -- test order/timing realted?

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 8.0
    • None
    • None

    Description

      jenkins emais indicates this test has failed a fair bit since it was added 2 months ago ... the only test log i managed to grab before they were deleted was from https://builds.apache.org/job/Lucene-Solr-Tests-master/2112/

      IIUC the root problem is that testHealthCheckHandler() is deliberately expring the ZK session, so if that actually happens (not garunteed, see SOLR-11456) and HealthCheckHandlerTest.testHealthCheckHandlerSolrJ() runs after it, then the helath check with fail and testHealthCheckHandlerSolrJ() will give a false negative.

      Attachments

        1. jenkins.2112.HealthCheckHandlerTest.txt
          21.57 MB
          Chris M. Hostetter

        Issue Links

          Activity

            note this relevant bit from the attached jenkins log...

               [junit4]   2> 7608228 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandler-seed#[D95E86B813E36BF4]) [    ] o.a.s.SolrTestCaseJ4 ###Starting testHealthCheckHandler
               [junit4]   2> 7608326 INFO  (qtp723337605-16665) [n:127.0.0.1:59629_solr    ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/health params={wt=javabin&version=2} status=0 QTime=0
               [junit4]   2> 7608327 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandler-seed#[D95E86B813E36BF4]) [    ] o.a.s.c.MiniSolrCloudCluster Expired zookeeper session 98767291805859843 from node htt
            p://127.0.0.1:59629/solr
               [junit4]   2> 7608347 INFO  (zkCallback-2223-thread-1) [    ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0)
               [junit4]   2> 7608514 WARN  (zkCallback-2218-thread-2-processing-n:127.0.0.1:59629_solr) [n:127.0.0.1:59629_solr    ] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@7e690903 name: ZooKeeperConnection Watcher:127.0.0.1:54525/solr got event WatchedEvent state:Disconnected type:None path:null path: null type: None
               [junit4]   2> 7608514 WARN  (zkCallback-2218-thread-2-processing-n:127.0.0.1:59629_solr) [n:127.0.0.1:59629_solr    ] o.a.s.c.c.ConnectionManager zkClient has disconnected
               [junit4]   2> 7608521 INFO  (qtp723337605-16663) [n:127.0.0.1:59629_solr    ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/health params={wt=javabin&version=2} status=0 QTime=0
               [junit4]   2> 7608607 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandler-seed#[D95E86B813E36BF4]) [    ] o.a.s.SolrTestCaseJ4 ###Ending testHealthCheckHandler
               [junit4]   2> 7608619 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandlerWithCloudClient-seed#[D95E86B813E36BF4]) [    ] o.a.s.SolrTestCaseJ4 ###Starting testHealthCheckHandlerWithCloudClient
               [junit4]   2> 7608647 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandlerWithCloudClient-seed#[D95E86B813E36BF4]) [    ] o.a.s.SolrTestCaseJ4 ###Ending testHealthCheckHandlerWithCloudClient
               [junit4]   2> 7608688 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandlerSolrJ-seed#[D95E86B813E36BF4]) [    ] o.a.s.SolrTestCaseJ4 ###Starting testHealthCheckHandlerSolrJ
               [junit4]   2> 7608706 INFO  (qtp723337605-16668) [n:127.0.0.1:59629_solr    ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/health params={wt=javabin&version=2} status=503 QTime=0
               [junit4]   2> 7608707 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandlerSolrJ-seed#[D95E86B813E36BF4]) [    ] o.a.s.SolrTestCaseJ4 ###Ending testHealthCheckHandlerSolrJ
               [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=HealthCheckHandlerTest -Dtests.method=testHealthCheckHandlerSolrJ -Dtests.seed=D95E86B813E36BF4 -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=es-EC -Dtests.timezone=AGT -Dtests.asserts=true -Dtests.file.encoding=UTF-8
               [junit4] ERROR   0.06s J1 | HealthCheckHandlerTest.testHealthCheckHandlerSolrJ <<<
               [junit4]    > Throwable #1: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:59629/solr: Host Unavailable: Not connected to zk
            ...
            

            as opposed to running the same seed (w/o all test methods) on my local machine where it passes locally – IIUC because the "closing socket connection and attempting reconnect" has a chance to kick in before testHealthCheckHandlerSolrJ starts

            $ ant test  -Dtestcase=HealthCheckHandlerTest -Dtests.seed=D95E86B813E36BF4 -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=es-EC -Dtests.timezone=AGT -Dtests.asserts=true -Dtests.file.encoding=UTF-8
            ...
               [junit4]   2> 1976 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandler-seed#[D95E86B813E36BF4]) [    ] o.a.s.SolrTestCaseJ4 ###Starting testHealthCheckHandler
               [junit4]   2> 2172 INFO  (qtp2093157283-26) [n:127.0.0.1:46082_solr    ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/health params={wt=javabin&version=2} status=0 QTime=31
               [junit4]   2> 2213 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandler-seed#[D95E86B813E36BF4]) [    ] o.a.s.c.MiniSolrCloudCluster Expired zookeeper session 98801923520528387 from node http://127.0.0.1:46082/solr
               [junit4]   2> 2214 INFO  (zkCallback-13-thread-1) [    ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0)
               [junit4]   2> 2214 WARN  (jetty-launcher-1-thread-1-SendThread(127.0.0.1:44037)) [n:127.0.0.1:46082_solr    ] o.a.z.ClientCnxn Session 0x15f03d372910003 for server null, unexpected error, closing socket connection and attempting reconnect
               [junit4]   2> java.nio.channels.CancelledKeyException
               [junit4]   2> 	at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
               [junit4]   2> 	at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
               [junit4]   2> 	at org.apache.zookeeper.ClientCnxnSocketNIO.enableWrite(ClientCnxnSocketNIO.java:389)
               [junit4]   2> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:373)
               [junit4]   2> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
               [junit4]   2> 2315 INFO  (qtp2093157283-28) [n:127.0.0.1:46082_solr    ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/health params={wt=javabin&version=2} status=0 QTime=0
               [junit4]   2> 2318 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandler-seed#[D95E86B813E36BF4]) [    ] o.a.s.SolrTestCaseJ4 ###Ending testHealthCheckHandler
               [junit4]   2> 2319 WARN  (zkCallback-8-thread-3-processing-n:127.0.0.1:46082_solr) [n:127.0.0.1:46082_solr    ] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@30d07b10 name: ZooKeeperConnection Watcher:127.0.0.1:44037/solr got event WatchedEvent state:Disconnected type:None path:null path: null type: None
               [junit4]   2> 2319 WARN  (zkCallback-8-thread-3-processing-n:127.0.0.1:46082_solr) [n:127.0.0.1:46082_solr    ] o.a.s.c.c.ConnectionManager zkClient has disconnected
               [junit4] OK      0.37s | HealthCheckHandlerTest.testHealthCheckHandler
               [junit4]   2> 2336 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandlerWithCloudClient-seed#[D95E86B813E36BF4]) [    ] o.a.s.SolrTestCaseJ4 ###Starting testHealthCheckHandlerWithCloudClient
               [junit4]   2> 2338 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandlerWithCloudClient-seed#[D95E86B813E36BF4]) [    ] o.a.s.SolrTestCaseJ4 ###Ending testHealthCheckHandlerWithCloudClient
               [junit4] OK      0.01s | HealthCheckHandlerTest.testHealthCheckHandlerWithCloudClient
               [junit4]   2> 2342 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandlerSolrJ-seed#[D95E86B813E36BF4]) [    ] o.a.s.SolrTestCaseJ4 ###Starting testHealthCheckHandlerSolrJ
               [junit4]   2> 2345 INFO  (qtp2093157283-26) [n:127.0.0.1:46082_solr    ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/health params={wt=javabin&version=2} status=0 QTime=0
               [junit4]   2> 2346 INFO  (TEST-HealthCheckHandlerTest.testHealthCheckHandlerSolrJ-seed#[D95E86B813E36BF4]) [    ] o.a.s.SolrTestCaseJ4 ###Ending testHealthCheckHandlerSolrJ
               [junit4] OK      0.01s | HealthCheckHandlerTest.testHealthCheckHandlerSolrJ
            
            
            hossman Chris M. Hostetter added a comment - note this relevant bit from the attached jenkins log... [junit4] 2> 7608228 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandler-seed#[D95E86B813E36BF4]) [ ] o.a.s.SolrTestCaseJ4 ###Starting testHealthCheckHandler [junit4] 2> 7608326 INFO (qtp723337605-16665) [n:127.0.0.1:59629_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/health params={wt=javabin&version=2} status=0 QTime=0 [junit4] 2> 7608327 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandler-seed#[D95E86B813E36BF4]) [ ] o.a.s.c.MiniSolrCloudCluster Expired zookeeper session 98767291805859843 from node htt p://127.0.0.1:59629/solr [junit4] 2> 7608347 INFO (zkCallback-2223-thread-1) [ ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0) [junit4] 2> 7608514 WARN (zkCallback-2218-thread-2-processing-n:127.0.0.1:59629_solr) [n:127.0.0.1:59629_solr ] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@7e690903 name: ZooKeeperConnection Watcher:127.0.0.1:54525/solr got event WatchedEvent state:Disconnected type:None path:null path: null type: None [junit4] 2> 7608514 WARN (zkCallback-2218-thread-2-processing-n:127.0.0.1:59629_solr) [n:127.0.0.1:59629_solr ] o.a.s.c.c.ConnectionManager zkClient has disconnected [junit4] 2> 7608521 INFO (qtp723337605-16663) [n:127.0.0.1:59629_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/health params={wt=javabin&version=2} status=0 QTime=0 [junit4] 2> 7608607 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandler-seed#[D95E86B813E36BF4]) [ ] o.a.s.SolrTestCaseJ4 ###Ending testHealthCheckHandler [junit4] 2> 7608619 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandlerWithCloudClient-seed#[D95E86B813E36BF4]) [ ] o.a.s.SolrTestCaseJ4 ###Starting testHealthCheckHandlerWithCloudClient [junit4] 2> 7608647 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandlerWithCloudClient-seed#[D95E86B813E36BF4]) [ ] o.a.s.SolrTestCaseJ4 ###Ending testHealthCheckHandlerWithCloudClient [junit4] 2> 7608688 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandlerSolrJ-seed#[D95E86B813E36BF4]) [ ] o.a.s.SolrTestCaseJ4 ###Starting testHealthCheckHandlerSolrJ [junit4] 2> 7608706 INFO (qtp723337605-16668) [n:127.0.0.1:59629_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/health params={wt=javabin&version=2} status=503 QTime=0 [junit4] 2> 7608707 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandlerSolrJ-seed#[D95E86B813E36BF4]) [ ] o.a.s.SolrTestCaseJ4 ###Ending testHealthCheckHandlerSolrJ [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=HealthCheckHandlerTest -Dtests.method=testHealthCheckHandlerSolrJ -Dtests.seed=D95E86B813E36BF4 -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=es-EC -Dtests.timezone=AGT -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.06s J1 | HealthCheckHandlerTest.testHealthCheckHandlerSolrJ <<< [junit4] > Throwable #1: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:59629/solr: Host Unavailable: Not connected to zk ... as opposed to running the same seed (w/o all test methods) on my local machine where it passes locally – IIUC because the "closing socket connection and attempting reconnect" has a chance to kick in before testHealthCheckHandlerSolrJ starts $ ant test -Dtestcase=HealthCheckHandlerTest -Dtests.seed=D95E86B813E36BF4 -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=es-EC -Dtests.timezone=AGT -Dtests.asserts=true -Dtests.file.encoding=UTF-8 ... [junit4] 2> 1976 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandler-seed#[D95E86B813E36BF4]) [ ] o.a.s.SolrTestCaseJ4 ###Starting testHealthCheckHandler [junit4] 2> 2172 INFO (qtp2093157283-26) [n:127.0.0.1:46082_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/health params={wt=javabin&version=2} status=0 QTime=31 [junit4] 2> 2213 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandler-seed#[D95E86B813E36BF4]) [ ] o.a.s.c.MiniSolrCloudCluster Expired zookeeper session 98801923520528387 from node http://127.0.0.1:46082/solr [junit4] 2> 2214 INFO (zkCallback-13-thread-1) [ ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0) [junit4] 2> 2214 WARN (jetty-launcher-1-thread-1-SendThread(127.0.0.1:44037)) [n:127.0.0.1:46082_solr ] o.a.z.ClientCnxn Session 0x15f03d372910003 for server null, unexpected error, closing socket connection and attempting reconnect [junit4] 2> java.nio.channels.CancelledKeyException [junit4] 2> at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) [junit4] 2> at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) [junit4] 2> at org.apache.zookeeper.ClientCnxnSocketNIO.enableWrite(ClientCnxnSocketNIO.java:389) [junit4] 2> at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:373) [junit4] 2> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [junit4] 2> 2315 INFO (qtp2093157283-28) [n:127.0.0.1:46082_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/health params={wt=javabin&version=2} status=0 QTime=0 [junit4] 2> 2318 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandler-seed#[D95E86B813E36BF4]) [ ] o.a.s.SolrTestCaseJ4 ###Ending testHealthCheckHandler [junit4] 2> 2319 WARN (zkCallback-8-thread-3-processing-n:127.0.0.1:46082_solr) [n:127.0.0.1:46082_solr ] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@30d07b10 name: ZooKeeperConnection Watcher:127.0.0.1:44037/solr got event WatchedEvent state:Disconnected type:None path:null path: null type: None [junit4] 2> 2319 WARN (zkCallback-8-thread-3-processing-n:127.0.0.1:46082_solr) [n:127.0.0.1:46082_solr ] o.a.s.c.c.ConnectionManager zkClient has disconnected [junit4] OK 0.37s | HealthCheckHandlerTest.testHealthCheckHandler [junit4] 2> 2336 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandlerWithCloudClient-seed#[D95E86B813E36BF4]) [ ] o.a.s.SolrTestCaseJ4 ###Starting testHealthCheckHandlerWithCloudClient [junit4] 2> 2338 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandlerWithCloudClient-seed#[D95E86B813E36BF4]) [ ] o.a.s.SolrTestCaseJ4 ###Ending testHealthCheckHandlerWithCloudClient [junit4] OK 0.01s | HealthCheckHandlerTest.testHealthCheckHandlerWithCloudClient [junit4] 2> 2342 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandlerSolrJ-seed#[D95E86B813E36BF4]) [ ] o.a.s.SolrTestCaseJ4 ###Starting testHealthCheckHandlerSolrJ [junit4] 2> 2345 INFO (qtp2093157283-26) [n:127.0.0.1:46082_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/health params={wt=javabin&version=2} status=0 QTime=0 [junit4] 2> 2346 INFO (TEST-HealthCheckHandlerTest.testHealthCheckHandlerSolrJ-seed#[D95E86B813E36BF4]) [ ] o.a.s.SolrTestCaseJ4 ###Ending testHealthCheckHandlerSolrJ [junit4] OK 0.01s | HealthCheckHandlerTest.testHealthCheckHandlerSolrJ

            pretty sure these failures will be fixed automatically by the patch i just uploaded to SOLR-11456

            hossman Chris M. Hostetter added a comment - pretty sure these failures will be fixed automatically by the patch i just uploaded to SOLR-11456

            fixed via SOLR-11456

            hossman Chris M. Hostetter added a comment - fixed via SOLR-11456

            People

              hossman Chris M. Hostetter
              hossman Chris M. Hostetter
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: