Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-388

make supervisor more resilient to missing .ser files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.9.2-incubating
    • None
    • storm-core

    Description

      Currently supervisor process can not run without some kind of supervisor software like systemd. It exits too often on missing .ser file error with [INFO] Halting process

      examples:

      a)

      2014-07-03 20:32:53 b.s.d.supervisor [INFO] Shutting down and clearing state for
      id efd37b78-eb69-46a1-b317-9b5b4ba00584. Current supervisor time: 1404412373. S
      tate: :timed-out, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time
      -secs 1404412311, :storm-id "Storm-throughput-test-7-1404411531", :executors #

      {[ 2 2] [4 4] [6 6] [-1 -1]}

      , :port 6702}
      2014-07-03 20:32:53 b.s.d.supervisor [INFO] Shutting down 55f2b426-c170-4e48-a76
      8-2a82c0f383ce:efd37b78-eb69-46a1-b317-9b5b4ba00584
      2014-07-03 20:32:54 b.s.d.supervisor [INFO] Removing code for storm id Storm-thr
      oughput-test-7-1404411531
      2014-07-03 20:32:55 b.s.d.supervisor [INFO] Shut down 55f2b426-c170-4e48-a768-2a
      82c0f383ce:efd37b78-eb69-46a1-b317-9b5b4ba00584
      2014-07-03 20:32:55 b.s.d.supervisor [INFO] Launching worker with assignment #ba
      cktype.storm.daemon.supervisor.LocalAssignment{:storm-id "Storm-throughput-test-
      7-1404411531", :executors ([6 6] [4 4] [2 2])} for this supervisor 55f2b426-c170
      -4e48-a768-2a82c0f383ce on port 6702 with id 6518a348-1fea-4401-8b7b-365b4ac3627
      9
      2014-07-03 20:32:55 b.s.event [ERROR] Error when processing event
      java.io.FileNotFoundException: File 'storm-local/supervisor/stormdist/Storm-thro
      ughput-test-7-1404411531/stormconf.ser' does not exist

      b)

      2014-07-03 20:32:43 o.a.z.ClientCnxn [INFO] Socket connection established to localhost/127.0.0.1:2181, initiating session
      2014-07-03 20:32:51 o.a.z.ClientCnxn [INFO] Unable to reconnect to ZooKeeper service, session 0x146fb27b8400027 has expired, closing socket connection
      2014-07-03 20:32:51 o.a.c.f.s.ConnectionStateManager [INFO] State change: LOST
      8d-1069-44e3-b3ca-c25390cbf719
      2014-07-03 10:29:22 b.s.d.supervisor [INFO] Removing code for storm id Storm-throughput-test-1-140433
      5149
      2014-07-03 10:29:22 b.s.d.supervisor [INFO] Shut down 167cf900-2ec6-499b-9c09-12c1e48dbc08:f776588d-1
      069-44e3-b3ca-c25390cbf719
      2014-07-03 10:29:22 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.s
      upervisor.LocalAssignment{:storm-id "Storm-throughput-test-1-1404335149", :executors ([3 3] [5 5] [4
      4] [2 2] [1 1])} for this supervisor 167cf900-2ec6-499b-9c09-12c1e48dbc08 on port 6702 with id 1dd28a
      8e-53cd-4af3-a4ae-7ebae0b9427f
      2014-07-03 10:29:22 b.s.event [ERROR] Error when processing event
      java.io.FileNotFoundException: File 'storm-local/supervisor/stormdist/Storm-throughput-test-1-1404335
      149/stormconf.ser' does not exist

      in both cases there were problems with zookeeper connection event failure before missing .ser file error.

      Attachments

        Activity

          People

            Unassigned Unassigned
            hsn Radim Kolar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: