Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: v1.4.0
    • Fix Version/s: v1.3.0
    • Component/s: None
    • Labels:
      None

      Description

      When hdfs paths are date dependent, many handles can get left open for a long time without anything happening to them.

      Idea here is to watch the last update of each bucketWriter, and track idle writers, closing them once they pass a configured timeout

      1. FLUME-1660-7.patch
        14 kB
        Juhani Connolly

        Issue Links

          Activity

          Hide
          Hari Shreedharan added a comment -

          Juhani,

          I must admit I did not follow much of the discussion that happened on FLUME-1350 - and now there is too much to read. Why can't we really use the rollInterval for this? Once the rollInterval passes, don't we close the file? I am just concerned that we really end up having multiple configs which do almost the same thing. Or is this so that we can keep the rollInterval separate from the interval to consider a directory as "no longer written to?"

          Show
          Hari Shreedharan added a comment - Juhani, I must admit I did not follow much of the discussion that happened on FLUME-1350 - and now there is too much to read. Why can't we really use the rollInterval for this? Once the rollInterval passes, don't we close the file? I am just concerned that we really end up having multiple configs which do almost the same thing. Or is this so that we can keep the rollInterval separate from the interval to consider a directory as "no longer written to?"
          Hide
          Juhani Connolly added a comment -

          We use roll interval, however it is not a fix applicable in all situations, and it is clumsy. For example, if people use size based rolling along with filenames that are named by header variables, they may or may not be open for an arbitrary amount of time.

          Closing idle files on a timer seems to me a far more elegant solution than forcing users to guess an arbitrary time after which to roll or to limit open files(which close in order of oldest, even if the oldest is still active)

          I don't really think having the extra option hurts so long as documentation is good. I'm setting the default setting to inactive

          Show
          Juhani Connolly added a comment - We use roll interval, however it is not a fix applicable in all situations, and it is clumsy. For example, if people use size based rolling along with filenames that are named by header variables, they may or may not be open for an arbitrary amount of time. Closing idle files on a timer seems to me a far more elegant solution than forcing users to guess an arbitrary time after which to roll or to limit open files(which close in order of oldest, even if the oldest is still active) I don't really think having the extra option hurts so long as documentation is good. I'm setting the default setting to inactive
          Hide
          Hari Shreedharan added a comment -

          Thanks for the explanation, Juhani. Makes sense. Thanks!

          Show
          Hari Shreedharan added a comment - Thanks for the explanation, Juhani. Makes sense. Thanks!
          Hide
          Brock Noland added a comment -

          Marking patch available as a patch is on RB.

          Show
          Brock Noland added a comment - Marking patch available as a patch is on RB.
          Hide
          Juhani Connolly added a comment -

          Adding patch from RB

          Show
          Juhani Connolly added a comment - Adding patch from RB
          Hide
          Juhani Connolly added a comment -

          Latest patch with unit tests and corner case fix

          Show
          Juhani Connolly added a comment - Latest patch with unit tests and corner case fix
          Hide
          Mike Percy added a comment -

          Patch committed to trunk and 1.4. Thanks Juhani!

          Rev: d7747cfac8f64704fb39924c365a8ac343244b40

          Note: I am going to leave it up to RM whether to backport this to 1.3.0. But I don't see any reason why not. If so, the Fix Version on this JIRA should be updated to v1.3.0.

          Show
          Mike Percy added a comment - Patch committed to trunk and 1.4. Thanks Juhani! Rev: d7747cfac8f64704fb39924c365a8ac343244b40 Note: I am going to leave it up to RM whether to backport this to 1.3.0. But I don't see any reason why not. If so, the Fix Version on this JIRA should be updated to v1.3.0.
          Hide
          Brock Noland added a comment -

          Committed to 1.3.0

          Show
          Brock Noland added a comment - Committed to 1.3.0

            People

            • Assignee:
              Juhani Connolly
              Reporter:
              Juhani Connolly
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development