Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3664

Nimbus cannot recover from LocalFsBlobStore deletion

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.1.0, 2.2.0
    • None
    • blobstore, storm-server
    • None

    Description

      When all Nimbus instances in a cluster loose access to previously stored Blobs while at least one topology is deployed, the cluster cannot recover as none of the nodes is ever elected as leader due to missing blobs. Recovery is only possible when manually removing blob and topology data from Zookeeper.

      I understand that the LocalFs blob store implementation is not particularly suited for high availability deployments. However, this issue prevents sensible automated disaster recovery on small deployments where a full deployment of HDFS would not provide any benefits and simply introduce additional complexity.

      Reproduction Steps

      1. Deploy one or multiple Nimbus instances
      2. Deploy a Topology (such as the WordCount example)
      3. Stop all Nimbus Instances
      4. Remove all Blob directories
      5. Start all Nimbus Instances

      Expected Behavior

      When a topology's blobs are permanently lost, the topology itself should be marked as failed in favor of maintaining the cluster's availability as a single lost topology suffices to take down the entire system.

      Attachments

        Activity

          People

            Unassigned Unassigned
            .start Johannes Donath
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: