Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9846

Overseer is not always closed after being started.

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.5, 7.0
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      We should interrupt it on close.

        Issue Links

          Activity

          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7dec783b287ab554cc781622b4d6127e553fd2ae in lucene-solr's branch refs/heads/master from markrmiller
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7dec783 ]

          SOLR-9846: OverseerAutoReplicaFailoverThread can take too long to stop and leak out of unit tests.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7dec783b287ab554cc781622b4d6127e553fd2ae in lucene-solr's branch refs/heads/master from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7dec783 ] SOLR-9846 : OverseerAutoReplicaFailoverThread can take too long to stop and leak out of unit tests.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 89f0149feeb02545d902493d1cfae76a700692ad in lucene-solr's branch refs/heads/branch_6x from markrmiller
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=89f0149 ]

          SOLR-9846: OverseerAutoReplicaFailoverThread can take too long to stop and leak out of unit tests.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 89f0149feeb02545d902493d1cfae76a700692ad in lucene-solr's branch refs/heads/branch_6x from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=89f0149 ] SOLR-9846 : OverseerAutoReplicaFailoverThread can take too long to stop and leak out of unit tests.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I'm still seeing this.

          Show
          markrmiller@gmail.com Mark Miller added a comment - I'm still seeing this.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b5c7bd1d10f3df8b2a622f3d76bee72f028cc483 in lucene-solr's branch refs/heads/master from markrmiller
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b5c7bd1 ]

          SOLR-9846: Try and make sure Overseer is always closed in tests and it's threads are done when it exists close.

          Show
          jira-bot ASF subversion and git services added a comment - Commit b5c7bd1d10f3df8b2a622f3d76bee72f028cc483 in lucene-solr's branch refs/heads/master from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b5c7bd1 ] SOLR-9846 : Try and make sure Overseer is always closed in tests and it's threads are done when it exists close.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7da344011441824dd89a18d968661081f84f742c in lucene-solr's branch refs/heads/branch_6x from markrmiller
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7da3440 ]

          SOLR-9846: Try and make sure Overseer is always closed in tests and it's threads are done when it exists close.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7da344011441824dd89a18d968661081f84f742c in lucene-solr's branch refs/heads/branch_6x from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7da3440 ] SOLR-9846 : Try and make sure Overseer is always closed in tests and it's threads are done when it exists close.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Digging in, looks like somehow the Overseer may not have been getting closed in some tests. Added some tracking code to test Overseer is closed and made some small tweaks that may help address the problem. If not, the tracking should tell us more if this continues.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Digging in, looks like somehow the Overseer may not have been getting closed in some tests. Added some tracking code to test Overseer is closed and made some small tweaks that may help address the problem. If not, the tracking should tell us more if this continues.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a243befdbb4c011c33c27b1b864d4a202b401675 in lucene-solr's branch refs/heads/master from markrmiller
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a243bef ]

          SOLR-9846: Track Overseer close better.

          Show
          jira-bot ASF subversion and git services added a comment - Commit a243befdbb4c011c33c27b1b864d4a202b401675 in lucene-solr's branch refs/heads/master from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a243bef ] SOLR-9846 : Track Overseer close better.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 17f441e8e263ce29e2b1da0aa11506a9541e3d3a in lucene-solr's branch refs/heads/branch_6x from markrmiller
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=17f441e ]

          SOLR-9846: Track Overseer close better.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 17f441e8e263ce29e2b1da0aa11506a9541e3d3a in lucene-solr's branch refs/heads/branch_6x from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=17f441e ] SOLR-9846 : Track Overseer close better.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ed05debb4e223e07aeeccdc0a802b8c2a514ba23 in lucene-solr's branch refs/heads/master from markrmiller
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ed05deb ]

          SOLR-9846: Overseer is not always closed after being started.

          Show
          jira-bot ASF subversion and git services added a comment - Commit ed05debb4e223e07aeeccdc0a802b8c2a514ba23 in lucene-solr's branch refs/heads/master from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ed05deb ] SOLR-9846 : Overseer is not always closed after being started.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Still can end up in a situation where leader election starts a new Overseer that is not closed.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Still can end up in a situation where leader election starts a new Overseer that is not closed.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -
             [junit4] ERROR   0.00s | StreamExpressionTest (suite) <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: ObjectTracker found 1 object(s) that were not released!!! [Overseer]
             [junit4]    > org.apache.solr.common.util.ObjectReleaseTracker$ObjectTrackerException
             [junit4]    > 	at org.apache.solr.common.util.ObjectReleaseTracker.track(ObjectReleaseTracker.java:42)
             [junit4]    > 	at org.apache.solr.cloud.Overseer.start(Overseer.java:523)
             [junit4]    > 	at org.apache.solr.cloud.OverseerElectionContext.runLeaderProcess(ElectionContext.java:748)
             [junit4]    > 	at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
             [junit4]    > 	at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
             [junit4]    > 	at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:56)
             [junit4]    > 	at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:348)
             [junit4]    > 	at org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268)
             [junit4]    > 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             [junit4]    > 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             [junit4]    > 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
             [junit4]    > 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
             [junit4]    > 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([D0C5E99506097E9D]:0)
             [junit4]    > 	at org.apache.solr.SolrTestCaseJ4.teardownTestCases(SolrTestCaseJ4.java:301)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)Throwable #2: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.client.solrj.io.stream.StreamExpressionTest: 
             [junit4]    >    1) Thread[id=4039, name=OverseerHdfsCoreFailoverThread-97477342480957451-127.0.0.1:45975_solr-n_0000000003, state=TIMED_WAITING, group=Overseer Hdfs SolrCore Failover Thread.]
             [junit4]    >         at java.lang.Thread.sleep(Native Method)
             [junit4]    >         at org.apache.solr.cloud.OverseerAutoReplicaFailoverThread.run(OverseerAutoReplicaFailoverThread.java:139)
             [junit4]    >         at java.lang.Thread.run(Thread.java:745)
          
          Show
          markrmiller@gmail.com Mark Miller added a comment - [junit4] ERROR 0.00s | StreamExpressionTest (suite) <<< [junit4] > Throwable #1: java.lang.AssertionError: ObjectTracker found 1 object(s) that were not released!!! [Overseer] [junit4] > org.apache.solr.common.util.ObjectReleaseTracker$ObjectTrackerException [junit4] > at org.apache.solr.common.util.ObjectReleaseTracker.track(ObjectReleaseTracker.java:42) [junit4] > at org.apache.solr.cloud.Overseer.start(Overseer.java:523) [junit4] > at org.apache.solr.cloud.OverseerElectionContext.runLeaderProcess(ElectionContext.java:748) [junit4] > at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170) [junit4] > at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135) [junit4] > at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:56) [junit4] > at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:348) [junit4] > at org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268) [junit4] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [junit4] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) [junit4] > at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) [junit4] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [junit4] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] > at __randomizedtesting.SeedInfo.seed([D0C5E99506097E9D]:0) [junit4] > at org.apache.solr.SolrTestCaseJ4.teardownTestCases(SolrTestCaseJ4.java:301) [junit4] > at java.lang.Thread.run(Thread.java:745)Throwable #2: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.client.solrj.io.stream.StreamExpressionTest: [junit4] > 1) Thread[id=4039, name=OverseerHdfsCoreFailoverThread-97477342480957451-127.0.0.1:45975_solr-n_0000000003, state=TIMED_WAITING, group=Overseer Hdfs SolrCore Failover Thread.] [junit4] > at java.lang.Thread.sleep(Native Method) [junit4] > at org.apache.solr.cloud.OverseerAutoReplicaFailoverThread.run(OverseerAutoReplicaFailoverThread.java:139) [junit4] > at java.lang.Thread.run(Thread.java:745)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b30b6c58f70b79e3b8055265d213693fbee56ff5 in lucene-solr's branch refs/heads/master from markrmiller
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b30b6c5 ]

          SOLR-9846: Don't run Overseer threads if CoreContainer is shutdown.

          Show
          jira-bot ASF subversion and git services added a comment - Commit b30b6c58f70b79e3b8055265d213693fbee56ff5 in lucene-solr's branch refs/heads/master from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b30b6c5 ] SOLR-9846 : Don't run Overseer threads if CoreContainer is shutdown.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4de034a2931ed923de676e9cbefb21e4ca366601 in lucene-solr's branch refs/heads/branch_6x from markrmiller
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4de034a ]

          SOLR-9846: Overseer is not always closed after being started.

          1. Conflicts:
          2. solr/CHANGES.txt
          Show
          jira-bot ASF subversion and git services added a comment - Commit 4de034a2931ed923de676e9cbefb21e4ca366601 in lucene-solr's branch refs/heads/branch_6x from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4de034a ] SOLR-9846 : Overseer is not always closed after being started. Conflicts: solr/CHANGES.txt
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 20cac581713ee26af029aaa93e708eade65a2bd1 in lucene-solr's branch refs/heads/branch_6x from markrmiller
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=20cac58 ]

          SOLR-9846: Don't run Overseer threads if CoreContainer is shutdown.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 20cac581713ee26af029aaa93e708eade65a2bd1 in lucene-solr's branch refs/heads/branch_6x from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=20cac58 ] SOLR-9846 : Don't run Overseer threads if CoreContainer is shutdown.

            People

            • Assignee:
              markrmiller@gmail.com Mark Miller
              Reporter:
              markrmiller@gmail.com Mark Miller
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development