Flume
  1. Flume
  2. FLUME-252

Update Tail to get rid of races and truncation problems.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: v0.9.0, v0.9.1
    • Fix Version/s: v0.9.2
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      The first tail implementation used buffered readers and file readers. This caused problems because the read call was blocking and couldn't shutdown properly.

      The second tail implementation was mroe closely based on gnu tail's C implementation but relies on RandomAccessFile. This version had problems with races (restarting from beginning FLUME-218) and truncation (new test in FLUME-218 patch by Eric Sammer).

      The new approach will likely use NIO and nonblocking IO to act cleanly, or possibly use a JNI based approach to get to unix system calls to get and follow file descriptors or inode numbers.

        Issue Links

          Activity

          Hide
          Jonathan Hsieh added a comment -

          Closing released issues.

          Show
          Jonathan Hsieh added a comment - Closing released issues.
          Hide
          Jonathan Hsieh added a comment -

          committed

          Show
          Jonathan Hsieh added a comment - committed
          Hide
          Jonathan Hsieh added a comment -

          This version significantly reduces the chances of data duplication
          encountered due to the FLUME-252 (races in tail) bug. However, it
          does not completely fix the problem. A workaround is to use
          'exec("tail -F <file>")' instead of the tail source Linux/Unix
          systems. A new issue has been filed as FLUME-320 to continue tracking
          this problem.

          Show
          Jonathan Hsieh added a comment - This version significantly reduces the chances of data duplication encountered due to the FLUME-252 (races in tail) bug. However, it does not completely fix the problem. A workaround is to use 'exec("tail -F <file>")' instead of the tail source Linux/Unix systems. A new issue has been filed as FLUME-320 to continue tracking this problem.
          Hide
          Jonathan Hsieh added a comment -

          review here https://review.cloudera.org/r/981/
          having problems uploading patch right now.

          Show
          Jonathan Hsieh added a comment - review here https://review.cloudera.org/r/981/ having problems uploading patch right now.
          Hide
          Jonathan Hsieh added a comment -

          FLUME-205 will be fixed by this patch – Here's a description of the problem I encountered : FLUME-261

          Show
          Jonathan Hsieh added a comment - FLUME-205 will be fixed by this patch – Here's a description of the problem I encountered : FLUME-261
          Hide
          Jonathan Hsieh added a comment -

          Scratch FLUME-205. I tested with some chinese characters (我是中國人) and data did not get through properly. My guess is that reads in this tail are correct on output there is an endian or output encoding issue. Will leave FLUME-205 open.

          Show
          Jonathan Hsieh added a comment - Scratch FLUME-205 . I tested with some chinese characters (我是中國人) and data did not get through properly. My guess is that reads in this tail are correct on output there is an endian or output encoding issue. Will leave FLUME-205 open.
          Hide
          Jonathan Hsieh added a comment - - edited

          The solution I have address these problems with the following mechanisms:

          • FLUME-205: by using NIO api and only using byte[] (never doing charater encoding translations)
          • FLUME-248: added a method to cursor that is called on close and read file rotate, as well as a test that fails if not done properly
          • FLUME-218: Passes the python script found in that test.

          This does not address FLUME-148.

          Show
          Jonathan Hsieh added a comment - - edited The solution I have address these problems with the following mechanisms: FLUME-205 : by using NIO api and only using byte[] (never doing charater encoding translations) FLUME-248 : added a method to cursor that is called on close and read file rotate, as well as a test that fails if not done properly FLUME-218 : Passes the python script found in that test. This does not address FLUME-148 .
          Hide
          Jonathan Hsieh added a comment -

          readline / char interpretation is a related problem.

          Show
          Jonathan Hsieh added a comment - readline / char interpretation is a related problem.

            People

            • Assignee:
              Jonathan Hsieh
              Reporter:
              Jonathan Hsieh
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development