Flume
  1. Flume
  2. FLUME-1110

HDFS Sink throws IllegalStateException when flume-daemon shuts down

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: v1.1.0
    • Fix Version/s: None
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      When using HDFS sink, if you shutdown the daemon (sudo /etc/init.d/flume-ng-node stop), then an IllegalStateException is shown in the logs (/var/log/flume-ng/flume.log).

      2012-04-06 10:44:19,912 ERROR hdfs.HDFSEventSink: Error calling org.apache.flume.sink.hdfs.HDFSEventSink$4@32091738
      java.lang.IllegalStateException: Shutdown in progress
      at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:39)
      at java.lang.Runtime.addShutdownHook(Runtime.java:192)
      at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1607)
      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1579)
      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
      at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
      at org.apache.flume.sink.hdfs.BucketWriter.renameBucket(BucketWriter.java:196)
      at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:122)
      at org.apache.flume.sink.hdfs.HDFSEventSink$4.call(HDFSEventSink.java:440)
      at org.apache.flume.sink.hdfs.HDFSEventSink$4.call(HDFSEventSink.java:436)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)
      2012-04-06 10:44:19,927 INFO source.SyslogTcpSource: Syslog TCP Source stopping...
      2012-04-06 10:44:19,927 INFO source.SyslogTcpSource: Metrics:{ name:null counters:

      {events.success=11002}

      }

      1. FLUME-1110.patch
        0.7 kB
        Prasad Mujumdar

        Issue Links

          Activity

          Prasad Mujumdar created issue -
          Prasad Mujumdar made changes -
          Field Original Value New Value
          Comment [ Logged upstream Jira FLUME-1110 ]
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4681/
          -----------------------------------------------------------

          Review request for Flume and Arvind Prabhakar.

          Summary
          -------

          The sink runner's stop method first calls stop() to underlying sink and then shuts down the PollingRunner thread. If that thread is in middle of process, it leads to race conditions in the sink's process() and stop().
          Rather than making all sinks to handle concurrently process() and stop(), its safer to shutdown the runner thread first and then stop the sink.

          This addresses bug FLUME-1110.
          https://issues.apache.org/jira/browse/FLUME-1110

          Diffs


          flume-ng-core/src/main/java/org/apache/flume/SinkRunner.java e73c09b

          Diff: https://reviews.apache.org/r/4681/diff

          Testing
          -------

          full regression test run

          Thanks,

          Prasad

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4681/ ----------------------------------------------------------- Review request for Flume and Arvind Prabhakar. Summary ------- The sink runner's stop method first calls stop() to underlying sink and then shuts down the PollingRunner thread. If that thread is in middle of process, it leads to race conditions in the sink's process() and stop(). Rather than making all sinks to handle concurrently process() and stop(), its safer to shutdown the runner thread first and then stop the sink. This addresses bug FLUME-1110 . https://issues.apache.org/jira/browse/FLUME-1110 Diffs flume-ng-core/src/main/java/org/apache/flume/SinkRunner.java e73c09b Diff: https://reviews.apache.org/r/4681/diff Testing ------- full regression test run Thanks, Prasad
          Brock Noland made changes -
          Link This issue is duplicated by FLUME-1034 [ FLUME-1034 ]
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4681/#review6789
          -----------------------------------------------------------

          I think the change makes sense, but I am not sure if it solves the problem from the JIRA? From what I can tell about the error, it looks like HDFS is trying to add a shutdown hook after the shutdown has started.

          • Brock

          On 2012-04-09 06:54:55, Prasad Mujumdar wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4681/

          -----------------------------------------------------------

          (Updated 2012-04-09 06:54:55)

          Review request for Flume and Arvind Prabhakar.

          Summary

          -------

          The sink runner's stop method first calls stop() to underlying sink and then shuts down the PollingRunner thread. If that thread is in middle of process, it leads to race conditions in the sink's process() and stop().

          Rather than making all sinks to handle concurrently process() and stop(), its safer to shutdown the runner thread first and then stop the sink.

          This addresses bug FLUME-1110.

          https://issues.apache.org/jira/browse/FLUME-1110

          Diffs

          -----

          flume-ng-core/src/main/java/org/apache/flume/SinkRunner.java e73c09b

          Diff: https://reviews.apache.org/r/4681/diff

          Testing

          -------

          full regression test run

          Thanks,

          Prasad

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4681/#review6789 ----------------------------------------------------------- I think the change makes sense, but I am not sure if it solves the problem from the JIRA? From what I can tell about the error, it looks like HDFS is trying to add a shutdown hook after the shutdown has started. Brock On 2012-04-09 06:54:55, Prasad Mujumdar wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4681/ ----------------------------------------------------------- (Updated 2012-04-09 06:54:55) Review request for Flume and Arvind Prabhakar. Summary ------- The sink runner's stop method first calls stop() to underlying sink and then shuts down the PollingRunner thread. If that thread is in middle of process, it leads to race conditions in the sink's process() and stop(). Rather than making all sinks to handle concurrently process() and stop(), its safer to shutdown the runner thread first and then stop the sink. This addresses bug FLUME-1110 . https://issues.apache.org/jira/browse/FLUME-1110 Diffs ----- flume-ng-core/src/main/java/org/apache/flume/SinkRunner.java e73c09b Diff: https://reviews.apache.org/r/4681/diff Testing ------- full regression test run Thanks, Prasad
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-04-09 11:10:19, Brock Noland wrote:

          > I think the change makes sense, but I am not sure if it solves the problem from the JIRA? From what I can tell about the error, it looks like HDFS is trying to add a shutdown hook after the shutdown has started.

          The roller can cause the file to be closed during the process() and the stop() also closes the file. It looked like two threads are trying to close the same file simultaneously.

          • Prasad

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4681/#review6789
          -----------------------------------------------------------

          On 2012-04-09 06:54:55, Prasad Mujumdar wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4681/

          -----------------------------------------------------------

          (Updated 2012-04-09 06:54:55)

          Review request for Flume and Arvind Prabhakar.

          Summary

          -------

          The sink runner's stop method first calls stop() to underlying sink and then shuts down the PollingRunner thread. If that thread is in middle of process, it leads to race conditions in the sink's process() and stop().

          Rather than making all sinks to handle concurrently process() and stop(), its safer to shutdown the runner thread first and then stop the sink.

          This addresses bug FLUME-1110.

          https://issues.apache.org/jira/browse/FLUME-1110

          Diffs

          -----

          flume-ng-core/src/main/java/org/apache/flume/SinkRunner.java e73c09b

          Diff: https://reviews.apache.org/r/4681/diff

          Testing

          -------

          full regression test run

          Thanks,

          Prasad

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-04-09 11:10:19, Brock Noland wrote: > I think the change makes sense, but I am not sure if it solves the problem from the JIRA? From what I can tell about the error, it looks like HDFS is trying to add a shutdown hook after the shutdown has started. The roller can cause the file to be closed during the process() and the stop() also closes the file. It looked like two threads are trying to close the same file simultaneously. Prasad ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4681/#review6789 ----------------------------------------------------------- On 2012-04-09 06:54:55, Prasad Mujumdar wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4681/ ----------------------------------------------------------- (Updated 2012-04-09 06:54:55) Review request for Flume and Arvind Prabhakar. Summary ------- The sink runner's stop method first calls stop() to underlying sink and then shuts down the PollingRunner thread. If that thread is in middle of process, it leads to race conditions in the sink's process() and stop(). Rather than making all sinks to handle concurrently process() and stop(), its safer to shutdown the runner thread first and then stop the sink. This addresses bug FLUME-1110 . https://issues.apache.org/jira/browse/FLUME-1110 Diffs ----- flume-ng-core/src/main/java/org/apache/flume/SinkRunner.java e73c09b Diff: https://reviews.apache.org/r/4681/diff Testing ------- full regression test run Thanks, Prasad
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4681/#review6807
          -----------------------------------------------------------

          Ship it!

          +1

          @Brock will keep an eye on the issue and if it resurfaces after this fix, will open a new jira.

          • Arvind

          On 2012-04-09 06:54:55, Prasad Mujumdar wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4681/

          -----------------------------------------------------------

          (Updated 2012-04-09 06:54:55)

          Review request for Flume and Arvind Prabhakar.

          Summary

          -------

          The sink runner's stop method first calls stop() to underlying sink and then shuts down the PollingRunner thread. If that thread is in middle of process, it leads to race conditions in the sink's process() and stop().

          Rather than making all sinks to handle concurrently process() and stop(), its safer to shutdown the runner thread first and then stop the sink.

          This addresses bug FLUME-1110.

          https://issues.apache.org/jira/browse/FLUME-1110

          Diffs

          -----

          flume-ng-core/src/main/java/org/apache/flume/SinkRunner.java e73c09b

          Diff: https://reviews.apache.org/r/4681/diff

          Testing

          -------

          full regression test run

          Thanks,

          Prasad

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4681/#review6807 ----------------------------------------------------------- Ship it! +1 @Brock will keep an eye on the issue and if it resurfaces after this fix, will open a new jira. Arvind On 2012-04-09 06:54:55, Prasad Mujumdar wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4681/ ----------------------------------------------------------- (Updated 2012-04-09 06:54:55) Review request for Flume and Arvind Prabhakar. Summary ------- The sink runner's stop method first calls stop() to underlying sink and then shuts down the PollingRunner thread. If that thread is in middle of process, it leads to race conditions in the sink's process() and stop(). Rather than making all sinks to handle concurrently process() and stop(), its safer to shutdown the runner thread first and then stop the sink. This addresses bug FLUME-1110 . https://issues.apache.org/jira/browse/FLUME-1110 Diffs ----- flume-ng-core/src/main/java/org/apache/flume/SinkRunner.java e73c09b Diff: https://reviews.apache.org/r/4681/diff Testing ------- full regression test run Thanks, Prasad
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-04-09 22:35:44, Arvind Prabhakar wrote:

          > +1

          >

          > @Brock will keep an eye on the issue and if it resurfaces after this fix, will open a new jira.

          Prasad - please attach the patch to the Jira

          • Arvind

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4681/#review6807
          -----------------------------------------------------------

          On 2012-04-09 06:54:55, Prasad Mujumdar wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4681/

          -----------------------------------------------------------

          (Updated 2012-04-09 06:54:55)

          Review request for Flume and Arvind Prabhakar.

          Summary

          -------

          The sink runner's stop method first calls stop() to underlying sink and then shuts down the PollingRunner thread. If that thread is in middle of process, it leads to race conditions in the sink's process() and stop().

          Rather than making all sinks to handle concurrently process() and stop(), its safer to shutdown the runner thread first and then stop the sink.

          This addresses bug FLUME-1110.

          https://issues.apache.org/jira/browse/FLUME-1110

          Diffs

          -----

          flume-ng-core/src/main/java/org/apache/flume/SinkRunner.java e73c09b

          Diff: https://reviews.apache.org/r/4681/diff

          Testing

          -------

          full regression test run

          Thanks,

          Prasad

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-04-09 22:35:44, Arvind Prabhakar wrote: > +1 > > @Brock will keep an eye on the issue and if it resurfaces after this fix, will open a new jira. Prasad - please attach the patch to the Jira Arvind ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4681/#review6807 ----------------------------------------------------------- On 2012-04-09 06:54:55, Prasad Mujumdar wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4681/ ----------------------------------------------------------- (Updated 2012-04-09 06:54:55) Review request for Flume and Arvind Prabhakar. Summary ------- The sink runner's stop method first calls stop() to underlying sink and then shuts down the PollingRunner thread. If that thread is in middle of process, it leads to race conditions in the sink's process() and stop(). Rather than making all sinks to handle concurrently process() and stop(), its safer to shutdown the runner thread first and then stop the sink. This addresses bug FLUME-1110 . https://issues.apache.org/jira/browse/FLUME-1110 Diffs ----- flume-ng-core/src/main/java/org/apache/flume/SinkRunner.java e73c09b Diff: https://reviews.apache.org/r/4681/diff Testing ------- full regression test run Thanks, Prasad
          Prasad Mujumdar made changes -
          Attachment FLUME-1110.patch [ 12522031 ]
          Prasad Mujumdar made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Arvind Prabhakar added a comment -

          Patch committed. Thanks Prasad!

          Show
          Arvind Prabhakar added a comment - Patch committed. Thanks Prasad!
          Arvind Prabhakar made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in flume-trunk #166 (See https://builds.apache.org/job/flume-trunk/166/)
          FLUME-1110. HDFS Sink throws IllegalStateException when flume shuts down.

          (Prasad Mujumdar via Arvind Prabhakar) (Revision 1311517)

          Result = SUCCESS
          arvind : http://svn.apache.org/viewvc/?view=rev&rev=1311517
          Files :

          • /incubator/flume/trunk/flume-ng-core/src/main/java/org/apache/flume/SinkRunner.java
          Show
          Hudson added a comment - Integrated in flume-trunk #166 (See https://builds.apache.org/job/flume-trunk/166/ ) FLUME-1110 . HDFS Sink throws IllegalStateException when flume shuts down. (Prasad Mujumdar via Arvind Prabhakar) (Revision 1311517) Result = SUCCESS arvind : http://svn.apache.org/viewvc/?view=rev&rev=1311517 Files : /incubator/flume/trunk/flume-ng-core/src/main/java/org/apache/flume/SinkRunner.java
          Hide
          Brock Noland added a comment -

          I am still getting this with trunk:

          12/04/18 22:16:30 ERROR hdfs.HDFSEventSink: Error calling org.apache.flume.sink.hdfs.HDFSEventSink$4@5dd68001
          java.lang.IllegalStateException: Shutdown in progress
          	at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:39)
          	at java.lang.Runtime.addShutdownHook(Runtime.java:192)
          	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1607)
          	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1579)
          	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
          	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111)
          	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:212)
          	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
          	at org.apache.flume.sink.hdfs.BucketWriter.renameBucket(BucketWriter.java:201)
          	at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:127)
          	at org.apache.flume.sink.hdfs.HDFSEventSink$4.call(HDFSEventSink.java:442)
          	at org.apache.flume.sink.hdfs.HDFSEventSink$4.call(HDFSEventSink.java:1)
          	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          	at java.lang.Thread.run(Thread.java:662)
          
          Show
          Brock Noland added a comment - I am still getting this with trunk: 12/04/18 22:16:30 ERROR hdfs.HDFSEventSink: Error calling org.apache.flume.sink.hdfs.HDFSEventSink$4@5dd68001 java.lang.IllegalStateException: Shutdown in progress at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:39) at java.lang.Runtime.addShutdownHook(Runtime.java:192) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1607) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1579) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:212) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183) at org.apache.flume.sink.hdfs.BucketWriter.renameBucket(BucketWriter.java:201) at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:127) at org.apache.flume.sink.hdfs.HDFSEventSink$4.call(HDFSEventSink.java:442) at org.apache.flume.sink.hdfs.HDFSEventSink$4.call(HDFSEventSink.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
          Brock Noland made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Mike Percy made changes -
          Fix Version/s v1.3.0 [ 12322140 ]
          Fix Version/s v1.2.0 [ 12320243 ]
          Brock Noland made changes -
          Fix Version/s v1.4.0 [ 12323372 ]
          Fix Version/s v1.3.0 [ 12322140 ]
          Hide
          Hari Shreedharan added a comment -

          Is this still an issue?

          Show
          Hari Shreedharan added a comment - Is this still an issue?
          Hide
          Mike Percy added a comment -

          This was fixed a long time ago, related to disabling the shutdown hook.

          Show
          Mike Percy added a comment - This was fixed a long time ago, related to disabling the shutdown hook.
          Mike Percy made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Cannot Reproduce [ 5 ]
          Mike Percy made changes -
          Fix Version/s v1.4.0 [ 12323372 ]

            People

            • Assignee:
              Prasad Mujumdar
              Reporter:
              Prasad Mujumdar
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development