Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-1515

LocalState corruption after hard reboot on Windows

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.10.0, 0.9.4
    • 2.1.0
    • storm-core
    • Windows Server 2012

    Description

      After a hard reboot on windows I'm seeing LocalState files for the supervisor that contain a few hundred NULs, resulting in a StreamCorruptedException on deserialization and the supervisor failing to start.

      2016-01-27T17:04:10.848-0700 b.s.d.supervisor [INFO] Starting supervisor with id 45b27917-4ca0-4d96-8727-914909e3ac47 at host jtorbiak-ws.nj.invidi.com
      2016-01-27T17:04:11.673-0700 b.s.event [ERROR] Error when processing event
      java.lang.RuntimeException: java.io.StreamCorruptedException: invalid stream header: 00000000
              at backtype.storm.serialization.DefaultSerializationDelegate.deserialize(DefaultSerializationDelegate.java:56) ~[storm-core-0.9.4.jar:0.9.4]
              at backtype.storm.utils.Utils.deserialize(Utils.java:89) ~[storm-core-0.9.4.jar:0.9.4]
              at backtype.storm.utils.LocalState.deserializeLatestVersion(LocalState.java:65) ~[storm-core-0.9.4.jar:0.9.4]
              at backtype.storm.utils.LocalState.snapshot(LocalState.java:47) ~[storm-core-0.9.4.jar:0.9.4]
              at backtype.storm.utils.LocalState.get(LocalState.java:72) ~[storm-core-0.9.4.jar:0.9.4]
              at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:234) ~[storm-core-0.9.4.jar:0.9.4]
              at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
              at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
              at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
              at clojure.core$partial$fn__4190.doInvoke(core.clj:2396) ~[clojure-1.5.1.jar:na]
              at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
              at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:40) ~[storm-core-0.9.4.jar:0.9.4]
              at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
              at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
      Caused by: java.io.StreamCorruptedException: invalid stream header: 00000000
              at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:806) ~[na:1.8.0_45]
              at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) ~[na:1.8.0_45]
              at backtype.storm.serialization.DefaultSerializationDelegate.deserialize(DefaultSerializationDelegate.java:51) ~[storm-core-0.9.4.jar:0.9.4]
              ... 13 common frames omitted
      2016-01-27T17:04:11.674-0700 b.s.util [ERROR] Halting process: ("Error when processing an event")
      java.lang.RuntimeException: ("Error when processing an event")
              at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.4.jar:0.9.4]
              at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
              at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:48) [storm-core-0.9.4.jar:0.9.4]
              at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
              at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
      2016-01-27T17:04:11.695-0700 b.s.d.supervisor [INFO] Shutting down supervisor 45b27917-4ca0-4d96-8727-914909e3ac47
      

      This is very similar to STORM-307, except the LocalState files contain NULs instead of being empty. I'm guessing this corruption type is specific to Windows.

      Attachments

        Activity

          People

            frison Tim Frison
            torbiak Jordan Torbiak
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h
                3h