Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: v1.0.0
    • Fix Version/s: None
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      Flume NG currently doesn't have a TailSource as existed in OG

      1. flume-tail.patch
        19 kB
        Juhani Connolly
      2. flume-931.patch
        22 kB
        Juhani Connolly

        Activity

        Hide
        Juhani Connolly added a comment - - edited

        Here's an inital pass I took at this, based on the OG source.
        I've just reused the Cursor as is and based the general behavior around that in OG, along with simple testing.
        I'd like to iron out any details about any further configuration needed, or any problems, and then I'll add on the custom delimiter and try to put together a unit test.

        Show
        Juhani Connolly added a comment - - edited Here's an inital pass I took at this, based on the OG source. I've just reused the Cursor as is and based the general behavior around that in OG, along with simple testing. I'd like to iron out any details about any further configuration needed, or any problems, and then I'll add on the custom delimiter and try to put together a unit test.
        Hide
        Juhani Connolly added a comment -

        Have an issue with shutdown which I fixed along with multiple cursor support. I'll post another patch on monday once I've tested it properly

        Show
        Juhani Connolly added a comment - Have an issue with shutdown which I fixed along with multiple cursor support. I'll post another patch on monday once I've tested it properly
        Hide
        Bruno Mahé added a comment - - edited

        Some notes also:

        • The license header is wrong. It needs to be updated for Apache. I would recommend to use one from some other files form flume-ng
        • It looks like the filename is a necessary parameter for that source. So you may want to add a Preconditions.checkState statement.
        Show
        Bruno Mahé added a comment - - edited Some notes also: The license header is wrong. It needs to be updated for Apache. I would recommend to use one from some other files form flume-ng It looks like the filename is a necessary parameter for that source. So you may want to add a Preconditions.checkState statement.
        Hide
        Otis Gospodnetic added a comment -

        I wonder if now is a good opportunity to implement tail functionality that I remember a lot of people asking about even back when Flume was on Github - things like making sure that when log files roll over data is not missed, etc. (there was a number of good threads on this topic on the old Flume ML on Google Groups)

        Show
        Otis Gospodnetic added a comment - I wonder if now is a good opportunity to implement tail functionality that I remember a lot of people asking about even back when Flume was on Github - things like making sure that when log files roll over data is not missed, etc. (there was a number of good threads on this topic on the old Flume ML on Google Groups)
        Hide
        Juhani Connolly added a comment -

        I'll try and look over the old ML archives after I get the base implementation down.

        Show
        Juhani Connolly added a comment - I'll try and look over the old ML archives after I get the base implementation down.
        Hide
        Juhani Connolly added a comment - - edited

        Posted a properly working version.
        The previous version had a couple thread synchronisation issues that should be fixed in this.
        Also changed the headers on both file to the apache header. However, the cursor.java is actually more or less the same as the one from OG(the only real change are with the packaging of events), so I'm not entirely sure of what the proper header for that is?

        Also added in multiple file support.

        Things not implemented that were in OG(I'll get around to these once I'm happy things work well as is. Running an initial test on a few G a day of data flow):

        • Custom delimiters
        • directory tailing

        An area of concern that remains is with the handling of interrupts:

        • In process() i'm currently just returning Status.BACKOFF if the poll gets interrupted but thinking this may have to be changed to an EventDeliveryException. I'm not particularly happy with either choice though and would be curious of other peoples opionions on the matter.
        Show
        Juhani Connolly added a comment - - edited Posted a properly working version. The previous version had a couple thread synchronisation issues that should be fixed in this. Also changed the headers on both file to the apache header. However, the cursor.java is actually more or less the same as the one from OG(the only real change are with the packaging of events), so I'm not entirely sure of what the proper header for that is? Also added in multiple file support. Things not implemented that were in OG(I'll get around to these once I'm happy things work well as is. Running an initial test on a few G a day of data flow): Custom delimiters directory tailing An area of concern that remains is with the handling of interrupts: In process() i'm currently just returning Status.BACKOFF if the poll gets interrupted but thinking this may have to be changed to an EventDeliveryException. I'm not particularly happy with either choice though and would be curious of other peoples opionions on the matter.
        Hide
        E. Sammer added a comment -

        I haven't yet looked at the patch, but the tail source in 0.9.x was the bane of everyone's existence. It's much harder than it sounds (e.g. we need to do things like track inodes in order to catch when files are rotated which is difficult in java without JNI) and I'm extremely disinclined to re-include any tail functionality unless those issues are resolved. Absolutely any ports of the original, very problematic code aren't going to be a welcome addition to 1.x. All of that said, I'll try and take a look at the code and give it a fair shake, but if it is simply the same thing from 0.9.x (albeit cleaned up) then there's (almost) no advantage over exec source with tail -F.

        Show
        E. Sammer added a comment - I haven't yet looked at the patch, but the tail source in 0.9.x was the bane of everyone's existence. It's much harder than it sounds (e.g. we need to do things like track inodes in order to catch when files are rotated which is difficult in java without JNI) and I'm extremely disinclined to re-include any tail functionality unless those issues are resolved. Absolutely any ports of the original, very problematic code aren't going to be a welcome addition to 1.x. All of that said, I'll try and take a look at the code and give it a fair shake, but if it is simply the same thing from 0.9.x (albeit cleaned up) then there's (almost) no advantage over exec source with tail -F.
        Hide
        Juhani Connolly added a comment -

        I did notice the cursor code was pretty crazy, and that kind of factored into my decision to leave it alone as it seemed to work(more or less)... So I can certainly see where you're coming from.
        If you think it can be put in with a more thorough rewrite I could probably look into that, but like you were saying without access to inodes, checking for log rotations is certainly going to be awkward.

        Worst case, I can always change the source to work with the plugin framework and put it up on github or something for those people that do want the small advantages over using an execSource.

        Show
        Juhani Connolly added a comment - I did notice the cursor code was pretty crazy, and that kind of factored into my decision to leave it alone as it seemed to work(more or less)... So I can certainly see where you're coming from. If you think it can be put in with a more thorough rewrite I could probably look into that, but like you were saying without access to inodes, checking for log rotations is certainly going to be awkward. Worst case, I can always change the source to work with the plugin framework and put it up on github or something for those people that do want the small advantages over using an execSource.
        Hide
        Juhani Connolly added a comment -

        Fwiw, I am willing to update this patch against the recent developments if people feel it is needed.

        With that being said, I actually now subscribe to the belief that a better approach to tailing would be to use a separate client(written in a language like c or python that can actually track inodes) that sends the data by thrift/avro(there are plenty of such implementations for scribe). So considering that, unless someone really wants this, I think it may be best to close this

        Show
        Juhani Connolly added a comment - Fwiw, I am willing to update this patch against the recent developments if people feel it is needed. With that being said, I actually now subscribe to the belief that a better approach to tailing would be to use a separate client(written in a language like c or python that can actually track inodes) that sends the data by thrift/avro(there are plenty of such implementations for scribe). So considering that, unless someone really wants this, I think it may be best to close this
        Hide
        Arvind Prabhakar added a comment -

        +1 on a native client for tailing files.

        I agree with Juhani that we should close this issue for now and if necessary open a separate issue to implement the tail client in a different language.

        Show
        Arvind Prabhakar added a comment - +1 on a native client for tailing files. I agree with Juhani that we should close this issue for now and if necessary open a separate issue to implement the tail client in a different language.
        Hide
        Juhani Connolly added a comment -

        Consensus on this is that tail should either be done with exec source, or with a separate client sending avro messages(possibly in a language that can track inodes)

        Show
        Juhani Connolly added a comment - Consensus on this is that tail should either be done with exec source, or with a separate client sending avro messages(possibly in a language that can track inodes)
        Hide
        Frank Grimes added a comment - - edited

        Have you considered looking at http://jpathwatch.wordpress.com ?
        It's apparently API compatible with Java 7's WatchService API but only requires Java 5 and up.

        Show
        Frank Grimes added a comment - - edited Have you considered looking at http://jpathwatch.wordpress.com ? It's apparently API compatible with Java 7's WatchService API but only requires Java 5 and up.
        Hide
        Otis Gospodnetic added a comment -

        Thanks for that jpathwatch URL!

        This sounds nice:
        "It uses the host platform’s native OS functions to achieve this to avoid polling."

        But in FlumeNG I think the only tailing option is the native "tail -f"?

        Show
        Otis Gospodnetic added a comment - Thanks for that jpathwatch URL! This sounds nice: "It uses the host platform’s native OS functions to achieve this to avoid polling." But in FlumeNG I think the only tailing option is the native "tail -f"?

          People

          • Assignee:
            Unassigned
            Reporter:
            Juhani Connolly
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development