Solr
  1. Solr
  2. SOLR-4099

Suspect zookeeper client thread doesn't call back the watcher, that occur the overseer collection can't work normal.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA, 4.0-BETA, 4.0
    • Fix Version/s: 4.1, 6.0
    • Component/s: SolrCloud
    • Labels:
      None
    • Environment:

      Zookeeper version: 3.2

      Description

      In test environment, our zookeeper version is old that our requirement version. Not use solr default 3.3.6 version.

      The overseer collection processor stop work. Trace the dump, the overseer wait for LatchChildWatcher.await.
      Check the zookeeper /overseer/collection-queue-work, block a lot of operation for collection.

      Check the logic, suspect the zookeeper client doesn't call back the watchevent that register the path "/overseer/collection-queue-work", unlucky the log level is debug.

      This case doesn't happen often, very little. But if it happen, it is fatal, we have to stop the leader server.

      Suggest the compensate solution, that doesn't await until notify. Only wait some time that maybe it is ten minutes or a half of hour or other value to recheck the queue again. Of cause if get the notify, that can direct work normal.

      1. patch-4099.txt
        2 kB
        Raintung Li

        Activity

        Hide
        Raintung Li added a comment -

        example

        Show
        Raintung Li added a comment - example
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Mark Robert Miller
        http://svn.apache.org/viewvc?view=revision&revision=1412142

        SOLR-4099: Allow the collection api work queue to make forward progress even when it's watcher is not fired for some reason.

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1412142 SOLR-4099 : Allow the collection api work queue to make forward progress even when it's watcher is not fired for some reason.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Mark Robert Miller
        http://svn.apache.org/viewvc?view=revision&revision=1412140

        SOLR-4099: Allow the collection api work queue to make forward progress even when it's watcher is not fired for some reason.

        Show
        Commit Tag Bot added a comment - [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1412140 SOLR-4099 : Allow the collection api work queue to make forward progress even when it's watcher is not fired for some reason.
        Hide
        Mark Miller added a comment -

        Thanks Raintung!

        Show
        Mark Miller added a comment - Thanks Raintung!
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Mark Robert Miller
        http://svn.apache.org/viewvc?view=revision&revision=1412142

        SOLR-4099: Allow the collection api work queue to make forward progress even when it's watcher is not fired for some reason.

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1412142 SOLR-4099 : Allow the collection api work queue to make forward progress even when it's watcher is not fired for some reason.

          People

          • Assignee:
            Mark Miller
            Reporter:
            Raintung Li
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 12h
              12h
              Remaining:
              Remaining Estimate - 12h
              12h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development