Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2667

Exception Handling in the AbstractHdfsBolt causes bolt to restart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • storm-hdfs
    • None

    Description

      Recently while reviewing the HDFSBolt code because of a question on the mailing list, I noticed that the abstract bolt will fail a tuple if an IOException is thrown while trying to write it out, and then force a sync in those cases.

      https://github.com/apache/storm/blob/64e29f365c9b5d3e15b33f33ab64e200345333e4/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/bolt/AbstractHdfsBolt.java#L150-L160

      A RuntimeException thrown by the formatter, on the other hand, bubbles up and forces the the worker to restart.

      Any IOException thrown by an Hdfs Output Stream means at that point the stream is closed and cannot be used any more. As part of our recovery we will try to sync, but this will also fail because the stream is closed by the exception that was thrown, and will result in the sync failing an a RuntimeException being thrown, and the entire worker being restarted.

      The current code "works" and eventually will recover from these issues, but it may take a while. It also means that we are likely to have more data loss than needed for some output formats.

      I would suggest that we try to recover from RuntimeExceptions in the same say that we are trying to recover from IOExceptions now. I also would suggest that we handle the special case where the tupleBatch.size() is 0 but we got an IOException from the writer, as the forceSync will not happen so tuples will continue to fail until the sync policy decides to sync, at which point the worker will crash and then recover.

      Attachments

        Activity

          People

            Unassigned Unassigned
            revans2 Robert Joseph Evans
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: