Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2704

Configurable poll delay for spooling directory source

    Details

    • Flags:
      Patch

      Description

      SpoolDir source polls a directory for new files at specific interval. This interval(or poll delay) is currently hardcoded as 500ms.
      500ms may be too fast for some applications. This JIRA is to make this property configurable.

      1. FLUME-2704.patch
        4 kB
        Somin Mithraa

        Issue Links

          Activity

          Hide
          Somin Somin Mithraa added a comment -

          This patch makes the poolDelay property configurable.
          The default value of 500ms is retained.

          Show
          Somin Somin Mithraa added a comment - This patch makes the poolDelay property configurable. The default value of 500ms is retained.
          Hide
          scaph01 Phil Scala added a comment -

          Thanks Somin, the idea seems reasonable to me, but just could you elaborate on the issue you were seeing that this fixes. Where I am going is: while 500ms is fast for checking for new files, if you were seeing race conditions where the file was still being writen to when this timer realized there was a new file then you could still have this issue, just not as frequently. We may need to focus on how the file was placed into the spool directory.

          Show
          scaph01 Phil Scala added a comment - Thanks Somin, the idea seems reasonable to me, but just could you elaborate on the issue you were seeing that this fixes. Where I am going is: while 500ms is fast for checking for new files, if you were seeing race conditions where the file was still being writen to when this timer realized there was a new file then you could still have this issue, just not as frequently. We may need to focus on how the file was placed into the spool directory.
          Hide
          Somin Somin Mithraa added a comment -

          Thanks for your interest in this issue. The idea is to make the poll delay configurable, so users can set it as per their application. If new files will be posted only once every hour, then polling for new files twice a second can be waste of resource and also the log file will grow large due to log message is generated twice a second. This has nothing to do with flume seeing the file before its written completely. For that issue we can make use of excludes in file names. This patch is meant to enhance the configuration Flume provides now.
          In my opinion, the poll delay should have been configurable in the first place. Do you see why this isn't the case.

          Show
          Somin Somin Mithraa added a comment - Thanks for your interest in this issue. The idea is to make the poll delay configurable, so users can set it as per their application. If new files will be posted only once every hour, then polling for new files twice a second can be waste of resource and also the log file will grow large due to log message is generated twice a second. This has nothing to do with flume seeing the file before its written completely. For that issue we can make use of excludes in file names. This patch is meant to enhance the configuration Flume provides now. In my opinion, the poll delay should have been configurable in the first place. Do you see why this isn't the case.
          Hide
          jrufus Johny Rufus added a comment -

          +1, the use case seems reasonable

          Show
          jrufus Johny Rufus added a comment - +1, the use case seems reasonable
          Hide
          scaph01 Phil Scala added a comment -

          Thanks Somin, yep, your explanation makes sense.

          Show
          scaph01 Phil Scala added a comment - Thanks Somin, yep, your explanation makes sense.
          Hide
          Somin Somin Mithraa added a comment -

          Phil Scala and Johny Rufus - can you review the attached patch and provide your comments. I have given a link to the review board in links section.

          Show
          Somin Somin Mithraa added a comment - Phil Scala and Johny Rufus - can you review the attached patch and provide your comments. I have given a link to the review board in links section.
          Hide
          roshan_naik Roshan Naik added a comment -

          Johny Rufus can u commit this if it looks good ?

          Show
          roshan_naik Roshan Naik added a comment - Johny Rufus can u commit this if it looks good ?
          Hide
          jrufus Johny Rufus added a comment -

          Roshan Naik, the changes look good, the patch doesnt apply cleanly, had to rebase, will commit this as soon as I run the tests

          Show
          jrufus Johny Rufus added a comment - Roshan Naik , the changes look good, the patch doesnt apply cleanly, had to rebase, will commit this as soon as I run the tests
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit af63d38fada97a06c542ad875ef31ea3e74d53cc in flume's branch refs/heads/trunk from Johny Rufus
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=af63d38 ]

          FLUME-2704. Configurable poll delay for spooling directory source

          (Somin Mithraa via Johny Rufus)

          Show
          jira-bot ASF subversion and git services added a comment - Commit af63d38fada97a06c542ad875ef31ea3e74d53cc in flume's branch refs/heads/trunk from Johny Rufus [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=af63d38 ] FLUME-2704 . Configurable poll delay for spooling directory source (Somin Mithraa via Johny Rufus)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a6b55f18350cb458f8eb23b95c1cff0812960673 in flume's branch refs/heads/flume-1.7 from Johny Rufus
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=a6b55f1 ]

          FLUME-2704. Configurable poll delay for spooling directory source

          (Somin Mithraa via Johny Rufus)

          Show
          jira-bot ASF subversion and git services added a comment - Commit a6b55f18350cb458f8eb23b95c1cff0812960673 in flume's branch refs/heads/flume-1.7 from Johny Rufus [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=a6b55f1 ] FLUME-2704 . Configurable poll delay for spooling directory source (Somin Mithraa via Johny Rufus)
          Hide
          jrufus Johny Rufus added a comment -

          Thanks Somin Mithraa for the patch !

          Show
          jrufus Johny Rufus added a comment - Thanks Somin Mithraa for the patch !
          Hide
          hudson Hudson added a comment -

          UNSTABLE: Integrated in Flume-trunk-hbase-1 #147 (See https://builds.apache.org/job/Flume-trunk-hbase-1/147/)
          FLUME-2704. Configurable poll delay for spooling directory source (johnyrufus: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=af63d38fada97a06c542ad875ef31ea3e74d53cc)

          • flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java
          • flume-ng-doc/sphinx/FlumeUserGuide.rst
          • flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java
          Show
          hudson Hudson added a comment - UNSTABLE: Integrated in Flume-trunk-hbase-1 #147 (See https://builds.apache.org/job/Flume-trunk-hbase-1/147/ ) FLUME-2704 . Configurable poll delay for spooling directory source (johnyrufus: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=af63d38fada97a06c542ad875ef31ea3e74d53cc ) flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java flume-ng-doc/sphinx/FlumeUserGuide.rst flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java

            People

            • Assignee:
              Somin Somin Mithraa
              Reporter:
              Somin Somin Mithraa
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development