Flume
  1. Flume
  2. FLUME-1702

HDFSEventSink should write to a hidden file as opposed to a .tmp file

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: v1.4.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently we write to a .tmp file. The problem is that if MR jobs are being run on the directory we are writing to, then it's common for an MR job to list the directory, get a .tmp file and then in the mean time the .tmp file is renamed causing the job to fail when run.

      Using JavaMR you can use a PathFilter to avoid this, however a custom solution is required for Pig, Hive, etc.

      Perhaps we should write to a hidden file so that MR never tries to process data in flight.

      1. bugFLUME-1702.patch
        25 kB
        Jarek Jarcec Cecho
      2. bugFLUME-1702.patch
        25 kB
        Jarek Jarcec Cecho

        Issue Links

          Activity

          Hide
          Mike Percy added a comment -

          Whoops, didn't notice you filed this JIRA Brock. Adding description from dup ticket:

          We should add the capability to the HDFS sink to specify a prefix for the .tmp files. I believe this needs to be configurable and disabled by default.
          However we should document that we recommend "_" or "." as a prefix for the temp files.
          This is because Hadoop's default FileInputFormat will skip files beginning with "_" or "." (hidden files)

          Show
          Mike Percy added a comment - Whoops, didn't notice you filed this JIRA Brock. Adding description from dup ticket: We should add the capability to the HDFS sink to specify a prefix for the .tmp files. I believe this needs to be configurable and disabled by default. However we should document that we recommend "_" or "." as a prefix for the temp files. This is because Hadoop's default FileInputFormat will skip files beginning with "_" or "." (hidden files)
          Hide
          Brock Noland added a comment -

          Looks good, I committed this to trunk and 1.4! Thanks for your patch Jarcec!

          Show
          Brock Noland added a comment - Looks good, I committed this to trunk and 1.4! Thanks for your patch Jarcec!
          Hide
          Hudson added a comment -

          Integrated in flume-trunk #343 (See https://builds.apache.org/job/flume-trunk/343/)
          FLUME-1702: HDFSEventSink should write to a hidden file as opposed to a .tmp file (Revision ae62279f8066b1e2728a579ce8384eca36180e9d)

          Result = SUCCESS
          brock : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=ae62279f8066b1e2728a579ce8384eca36180e9d
          Files :

          • flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java
          • flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java
          • flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java
          • flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java
          • flume-ng-doc/sphinx/FlumeUserGuide.rst
          Show
          Hudson added a comment - Integrated in flume-trunk #343 (See https://builds.apache.org/job/flume-trunk/343/ ) FLUME-1702 : HDFSEventSink should write to a hidden file as opposed to a .tmp file (Revision ae62279f8066b1e2728a579ce8384eca36180e9d) Result = SUCCESS brock : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=ae62279f8066b1e2728a579ce8384eca36180e9d Files : flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java flume-ng-doc/sphinx/FlumeUserGuide.rst

            People

            • Assignee:
              Jarek Jarcec Cecho
              Reporter:
              Brock Noland
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development