Flume
  1. Flume
  2. FLUME-1930

Inflights should clean up executors on close.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: v1.4.0
    • Component/s: None
    • Labels:
      None

      Description

      Looks like the JVM hang Juhani saw in FLUME-1929 was due to the fact that InflightEventWrapper has executors that are not cleaned up

      1. stack-dump
        4 kB
        Juhani Connolly
      2. FLUME-1930-1.patch
        5 kB
        Hari Shreedharan
      3. FLUME-1930.patch
        5 kB
        Hari Shreedharan

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        1h 8m 1 Hari Shreedharan 27/Feb/13 19:56
        Patch Available Patch Available Resolved Resolved
        14d 11h 3m 1 Juhani Connolly 14/Mar/13 06:59
        Mike Percy made changes -
        Fix Version/s v1.4.0 [ 12323372 ]
        Fix Version/s v1.3.0 [ 12322140 ]
        Mike Percy made changes -
        Fix Version/s v1.3.0 [ 12322140 ]
        Hide
        Hudson added a comment -

        Integrated in flume-trunk #376 (See https://builds.apache.org/job/flume-trunk/376/)
        FLUME-1930: Inflights should clean up executors on close (Revision 9204456eee8522201649b31a949a5a77710c1b2e)

        Result = ABORTED
        juhani_connolly : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=9204456eee8522201649b31a949a5a77710c1b2e
        Files :

        • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
        • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java
        • flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Log.java
        • flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestLog.java
        Show
        Hudson added a comment - Integrated in flume-trunk #376 (See https://builds.apache.org/job/flume-trunk/376/ ) FLUME-1930 : Inflights should clean up executors on close (Revision 9204456eee8522201649b31a949a5a77710c1b2e) Result = ABORTED juhani_connolly : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=9204456eee8522201649b31a949a5a77710c1b2e Files : flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Log.java flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestLog.java
        Juhani Connolly made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Juhani Connolly added a comment -

        Confirmed working and committed, thanks Hari!

        Show
        Juhani Connolly added a comment - Confirmed working and committed, thanks Hari!
        Hari Shreedharan made changes -
        Attachment FLUME-1930-1.patch [ 12573683 ]
        Hide
        Hari Shreedharan added a comment -

        Removed an accidental change to hbase sink.

        Show
        Hari Shreedharan added a comment - Removed an accidental change to hbase sink.
        Hari Shreedharan made changes -
        Attachment FLUME-1930-1.patch [ 12573682 ]
        Hide
        Hari Shreedharan added a comment -

        If the checkpoint contents are present, the checkpoint rebuilder does not run (you can see this in the main function - it will simply bail and not startup at all.

        Show
        Hari Shreedharan added a comment - If the checkpoint contents are present, the checkpoint rebuilder does not run (you can see this in the main function - it will simply bail and not startup at all.
        Hari Shreedharan made changes -
        Attachment FLUME-1930-1.patch [ 12573682 ]
        Hide
        Hari Shreedharan added a comment -

        This patch should get rid of the hang.

        Show
        Hari Shreedharan added a comment - This patch should get rid of the hang.
        Hide
        Juhani Connolly added a comment - - edited

        It's been hung in the same state for over an hour. I'm going to try and get a full day where I can properly go over all the file channel source to understand the details of how it's working, but right now I'm not really able to say why this is happening.

        Incidentally, this happens only when I delete the checkpoint directory contents

        Show
        Juhani Connolly added a comment - - edited It's been hung in the same state for over an hour. I'm going to try and get a full day where I can properly go over all the file channel source to understand the details of how it's working, but right now I'm not really able to say why this is happening. Incidentally, this happens only when I delete the checkpoint directory contents
        Hide
        Hari Shreedharan added a comment -

        Thinking about it, we could just have the inflights get written out in the same thread. I have never seen it be more than a few KB. Also, those need to be written out during the checkpoint for FLUME-1516 anyway (that change is already in my patch for that jira). I will just make that change here.

        Show
        Hari Shreedharan added a comment - Thinking about it, we could just have the inflights get written out in the same thread. I have never seen it be more than a few KB. Also, those need to be written out during the checkpoint for FLUME-1516 anyway (that change is already in my patch for that jira). I will just make that change here.
        Hide
        Hari Shreedharan added a comment -

        Also, do you see any exceptions in the logs?

        Show
        Hari Shreedharan added a comment - Also, do you see any exceptions in the logs?
        Hide
        Hari Shreedharan added a comment -

        Juhani Connolly It looks like my analysis was correct. The thread rename happened in this patch itself (inflights-sync-thread). Interestingly I actually call shutdownNow on the executor, which should kill the executor anyway (since it is simply waiting for more events). Did you try waiting for some time (like a couple minutes?). I think it should just die.

        Show
        Hari Shreedharan added a comment - Juhani Connolly It looks like my analysis was correct. The thread rename happened in this patch itself (inflights-sync-thread). Interestingly I actually call shutdownNow on the executor, which should kill the executor anyway (since it is simply waiting for more events). Did you try waiting for some time (like a couple minutes?). I think it should just die.
        Juhani Connolly made changes -
        Priority Major [ 3 ] Minor [ 4 ]
        Juhani Connolly made changes -
        Attachment stack-dump [ 12573666 ]
        Hide
        Juhani Connolly added a comment -

        Unfortunately, this is still hanging, though the thread names have changed(They might have been renamed in a recent patch?)

        Tested against latest trunk with 1929 applied

        Show
        Juhani Connolly added a comment - Unfortunately, this is still hanging, though the thread names have changed(They might have been renamed in a recent patch?) Tested against latest trunk with 1929 applied
        Hide
        Hari Shreedharan added a comment -

        Brock Noland - I ignored that particular case on purpose. This is mainly just to make sure the executors are cleaned up immediately - they seem to be cleaned up eventually anyway. Adding try/catch for each just made the close method verbose and ugly - so I skipped it. If you think it is necessary, I will add it.

        Show
        Hari Shreedharan added a comment - Brock Noland - I ignored that particular case on purpose. This is mainly just to make sure the executors are cleaned up immediately - they seem to be cleaned up eventually anyway. Adding try/catch for each just made the close method verbose and ugly - so I skipped it. If you think it is necessary, I will add it.
        Hide
        Brock Noland added a comment -

        If backingStore.close() throws an IOException than inflightPuts.close(); and the takes one are not called.

        Show
        Brock Noland added a comment - If backingStore.close() throws an IOException than inflightPuts.close(); and the takes one are not called.
        Hari Shreedharan made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Assignee Hari Shreedharan [ hshreedharan ]
        Hari Shreedharan made changes -
        Field Original Value New Value
        Attachment FLUME-1930.patch [ 12571258 ]
        Hide
        Hari Shreedharan added a comment -

        Note that in the normal case, the executors do get cleaned up by the JVM in about a minute.

        Show
        Hari Shreedharan added a comment - Note that in the normal case, the executors do get cleaned up by the JVM in about a minute.
        Hari Shreedharan created issue -

          People

          • Assignee:
            Hari Shreedharan
            Reporter:
            Hari Shreedharan
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development