Details

    • Type: Improvement Improvement
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Data Collection
    • Labels:
      None

      Description

      Currently, we keep files open in order to detect rotation. This is wasteful of file handles, doesn't work correctly across reboots, and I think we can do better.

        Activity

        Hide
        Ari Rabkin added a comment -

        There's a subtle bug in how the new adaptor writes checkpoints. The last-mod date in the checkpoint reflects data SENT, not data WRITTEN. So after a collector crash, this adaptor will resume at the wrong place. Fix is for the adaptor to keep a mapping from send offset to last-mod-date, listen to commit reports, and write accordingly.

        If other adaptors start needing to do this, we might need to pull this functionality out into a utility class.

        Show
        Ari Rabkin added a comment - There's a subtle bug in how the new adaptor writes checkpoints. The last-mod date in the checkpoint reflects data SENT, not data WRITTEN. So after a collector crash, this adaptor will resume at the wrong place. Fix is for the adaptor to keep a mapping from send offset to last-mod-date, listen to commit reports, and write accordingly. If other adaptors start needing to do this, we might need to pull this functionality out into a utility class.
        Hide
        Hudson added a comment -

        Integrated in Chukwa-trunk #235 (See http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/235/)
        wrong version of
        . Use modification time to detect rotation.

        Show
        Hudson added a comment - Integrated in Chukwa-trunk #235 (See http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/235/ ) wrong version of . Use modification time to detect rotation.
        Hide
        Eric Yang added a comment -

        The new adaptor works well. Thank you Ari.

        Show
        Eric Yang added a comment - The new adaptor works well. Thank you Ari.
        Hide
        Ari Rabkin added a comment -

        I just committed this. Eric, do you mean that the new adaptor worked well, or just that it didn't break anything?

        Show
        Ari Rabkin added a comment - I just committed this. Eric, do you mean that the new adaptor worked well, or just that it didn't break anything?
        Hide
        Eric Yang added a comment -

        +1 The current patch worked well on my machines for the last 3 days.

        Show
        Eric Yang added a comment - +1 The current patch worked well on my machines for the last 3 days.
        Hide
        Ari Rabkin added a comment -

        Inadvertently broke FileTailingAdaptor. Fixed.

        Show
        Ari Rabkin added a comment - Inadvertently broke FileTailingAdaptor. Fixed.
        Hide
        Eric Yang added a comment -

        Give me a day to test this on my end.

        Show
        Eric Yang added a comment - Give me a day to test this on my end.
        Hide
        Jerome Boulon added a comment -

        @Eric: I don't recommend to go down that road but if you want something safe, use the inode but again this will not be portable.

        Show
        Jerome Boulon added a comment - @Eric: I don't recommend to go down that road but if you want something safe, use the inode but again this will not be portable.
        Hide
        Ari Rabkin added a comment -

        By the way. I now have added, more intrusive tests. I want to commit this.

        Show
        Ari Rabkin added a comment - By the way. I now have added, more intrusive tests. I want to commit this.
        Hide
        Ari Rabkin added a comment -

        How often do log files get touched after they're rotated out?

        Show
        Ari Rabkin added a comment - How often do log files get touched after they're rotated out?
        Hide
        Eric Yang added a comment -

        My only concern is modification time is easily changed by doing touch on the file. It is a fragile approach.

        Show
        Eric Yang added a comment - My only concern is modification time is easily changed by doing touch on the file. It is a fragile approach.
        Hide
        Jerome Boulon added a comment -

        +1 on @Ari comment

        Show
        Jerome Boulon added a comment - +1 on @Ari comment
        Hide
        Ari Rabkin added a comment -

        I am unhappy with JNI-based approaches, since I think it'll be unpleasant to maintain and debug, particularly across platforms. So I would rather stick with modification dates if at all possible.

        Show
        Ari Rabkin added a comment - I am unhappy with JNI-based approaches, since I think it'll be unpleasant to maintain and debug, particularly across platforms. So I would rather stick with modification dates if at all possible.
        Hide
        Eric Yang added a comment -

        There is a package called jtux which provides posix system calls. It has UFile.s_stat.st_ctime. It's BSD license. This should fit our requirement.

        http://www.basepath.com/aup/jtux

        Show
        Eric Yang added a comment - There is a package called jtux which provides posix system calls. It has UFile.s_stat.st_ctime. It's BSD license. This should fit our requirement. http://www.basepath.com/aup/jtux
        Hide
        Ari Rabkin added a comment -

        I had originally wanted to use creation dates, but I don't think the java APIs offer a way to do that.

        Show
        Ari Rabkin added a comment - I had originally wanted to use creation dates, but I don't think the java APIs offer a way to do that.
        Hide
        Eric Yang added a comment -

        Instead of modification dates, it makes more sense to keep track of creation time. If the creation time has changed, it means there was a rotation, and we can sort out where to start tailing base on creation time. When log rotation happens, the new filename does not change the creation time of the rotated log file.

        Show
        Eric Yang added a comment - Instead of modification dates, it makes more sense to keep track of creation time. If the creation time has changed, it means there was a rotation, and we can sort out where to start tailing base on creation time. When log rotation happens, the new filename does not change the creation time of the rotated log file.
        Hide
        Ari Rabkin added a comment -

        Note that that's not actually exercising the new code. The new code very carefully does not change the behavior of any existing adaptors. Not yet.

        Show
        Ari Rabkin added a comment - Note that that's not actually exercising the new code. The new code very carefully does not change the behavior of any existing adaptors. Not yet.
        Hide
        Eric Yang added a comment -

        -bash-3.1$ telnet localhost 9093
        Trying 127.0.0.1...
        Connected to localhost.localdomain (127.0.0.1).
        Escape character is '^]'.
        list
        adaptor_1b0f8e15eedd2cb346aa8c1cfd4f35d9) org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped mapred 0 /usr/local/chukwa/current/var/log/metrics/chukwa-mapred-mapred-1260413773627.log 5332706

        -bash-3.1$ ls -l chukwa-mapred-mapred-1260413773627.log
        rw-rr- 1 mapred users 168050 Dec 17 01:00 chukwa-mapred-mapred-1260413773627.log
        -bash-3.1$ ls -l chukwa-mapred-mapred-1260413773627.log.2009-12-16
        rw-rr- 1 mapred users 5332706 Dec 16 23:59 chukwa-mapred-mapred-1260413773627.log.2009-12-16

        It still doesn't rotate properly.

        Show
        Eric Yang added a comment - -bash-3.1$ telnet localhost 9093 Trying 127.0.0.1... Connected to localhost.localdomain (127.0.0.1). Escape character is '^]'. list adaptor_1b0f8e15eedd2cb346aa8c1cfd4f35d9) org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped mapred 0 /usr/local/chukwa/current/var/log/metrics/chukwa-mapred-mapred-1260413773627.log 5332706 -bash-3.1$ ls -l chukwa-mapred-mapred-1260413773627.log rw-r r - 1 mapred users 168050 Dec 17 01:00 chukwa-mapred-mapred-1260413773627.log -bash-3.1$ ls -l chukwa-mapred-mapred-1260413773627.log.2009-12-16 rw-r r - 1 mapred users 5332706 Dec 16 23:59 chukwa-mapred-mapred-1260413773627.log.2009-12-16 It still doesn't rotate properly.
        Hide
        Ari Rabkin added a comment -

        nod My sense is that this patch is not quite ready to commit but that the overall approach of using modification dates to detect rotation is workable.

        Show
        Ari Rabkin added a comment - nod My sense is that this patch is not quite ready to commit but that the overall approach of using modification dates to detect rotation is workable.
        Hide
        Eric Yang added a comment -

        I deployed this patch on my cluster. Let me run it for 2 days to be sure.

        Show
        Eric Yang added a comment - I deployed this patch on my cluster. Let me run it for 2 days to be sure.
        Hide
        Ari Rabkin added a comment -

        Various refactorings elsewhere. Unit tests pass.

        Show
        Ari Rabkin added a comment - Various refactorings elsewhere. Unit tests pass.
        Hide
        Ari Rabkin added a comment -

        I am NOT proposing to increment a sequence id on the file names. I'm proposing to use the file mod date plus length to figure out which rotated files are how old. Hash of last record (or last 200 bytes or whatever) is also an option. My intent is to make the code as modular as possible so that we can try out a couple different approaches.

        Show
        Ari Rabkin added a comment - I am NOT proposing to increment a sequence id on the file names. I'm proposing to use the file mod date plus length to figure out which rotated files are how old. Hash of last record (or last 200 bytes or whatever) is also an option. My intent is to make the code as modular as possible so that we can try out a couple different approaches.
        Hide
        Jerome Boulon added a comment -

        @Ari, incrementing a seqId on the file name will work but then you need control over the file creation

        Show
        Jerome Boulon added a comment - @Ari, incrementing a seqId on the file name will work but then you need control over the file creation
        Hide
        Jerome Boulon added a comment -

        We can keep track of current offset and the last record and compare it.
        This should work assuming that the date should not be the same and if it is and the record is the same then for us it will be a duplicate anyway.

        Show
        Jerome Boulon added a comment - We can keep track of current offset and the last record and compare it. This should work assuming that the date should not be the same and if it is and the record is the same then for us it will be a duplicate anyway.
        Hide
        Ari Rabkin added a comment -

        My idea is as follows:

        1) Adaptors should keep track of the last-modified date of the file they're tailing.
        2) Each time through, get the list of files whose names match the file in question, ignoring trailing stuff after a final dot.
        (Possible rotation results)
        3) Process them, oldest-first. Break ties using longest-first, or some name-based heuristic.

        The end goal of this is that if file Foo is being tailed, and then when we look next, Foo is shorter and there's a Foo.1, we'll finish tailing Foo.1, and then start tailing Foo.

        Show
        Ari Rabkin added a comment - My idea is as follows: 1) Adaptors should keep track of the last-modified date of the file they're tailing. 2) Each time through, get the list of files whose names match the file in question, ignoring trailing stuff after a final dot. (Possible rotation results) 3) Process them, oldest-first. Break ties using longest-first, or some name-based heuristic. The end goal of this is that if file Foo is being tailed, and then when we look next, Foo is shorter and there's a Foo.1, we'll finish tailing Foo.1, and then start tailing Foo.

          People

          • Assignee:
            Ari Rabkin
            Reporter:
            Ari Rabkin
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development