Flume
  1. Flume
  2. FLUME-1844

HDFSEventSink should have option to use RawLocalFileSystem

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: v1.4.0
    • Component/s: None
    • Labels:
      None

      Description

      Due to HADOOP-7844, we should have a way to use RawLocalFileSystem, which does not implement flush(), with HDFSEventSInk.

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in flume-trunk #352 (See https://builds.apache.org/job/flume-trunk/352/)
          FLUME-1844. HDFSEventSink should have option to use RawLocalFileSystem. (Revision 11875237420eef3e47ad84e7998e2caff6fc2de6)

          Result = SUCCESS
          harishreedharan : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=11875237420eef3e47ad84e7998e2caff6fc2de6
          Files :

          • flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java
          • flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSDataStream.java
          • flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java
          • flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestUseRawLocalFileSystem.java
          • flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSCompressedDataStream.java
          • flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java
          Show
          Hudson added a comment - Integrated in flume-trunk #352 (See https://builds.apache.org/job/flume-trunk/352/ ) FLUME-1844 . HDFSEventSink should have option to use RawLocalFileSystem. (Revision 11875237420eef3e47ad84e7998e2caff6fc2de6) Result = SUCCESS harishreedharan : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=11875237420eef3e47ad84e7998e2caff6fc2de6 Files : flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSDataStream.java flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestUseRawLocalFileSystem.java flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSCompressedDataStream.java flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java
          Hide
          Brock Noland added a comment -

          Great to hear! When HADOOP-7844 is fixed we should be able to remove this code..

          Show
          Brock Noland added a comment - Great to hear! When HADOOP-7844 is fixed we should be able to remove this code..
          Hide
          Juhani Connolly added a comment - - edited

          Thanks for this, I'll give it a spin on our test machines

          edit: confirmed local disk getting synced on each batch when using the new setting

          Show
          Juhani Connolly added a comment - - edited Thanks for this, I'll give it a spin on our test machines edit: confirmed local disk getting synced on each batch when using the new setting
          Hide
          Hari Shreedharan added a comment -

          Patch committed, rev: 11875237420eef3e47ad84e7998e2caff6fc2de6. Thanks Brock!

          Show
          Hari Shreedharan added a comment - Patch committed, rev: 11875237420eef3e47ad84e7998e2caff6fc2de6. Thanks Brock!
          Hide
          Brock Noland added a comment -

          Thanks Connor! Yeah I haven't tried that. However, I think I have a pretty good solution which is testable. I'll be posting a patch soon.

          Show
          Brock Noland added a comment - Thanks Connor! Yeah I haven't tried that. However, I think I have a pretty good solution which is testable. I'll be posting a patch soon.
          Hide
          Connor Woodson added a comment -

          What if you just use the file:/// tag but change the impl to RawLocalFileSystem? (name=fs.file.impl value=...fs.RawLocalFileSystem). Does that still cause it to complain?

          Show
          Connor Woodson added a comment - What if you just use the file:/// tag but change the impl to RawLocalFileSystem? (name=fs.file.impl value=...fs.RawLocalFileSystem). Does that still cause it to complain?
          Hide
          Brock Noland added a comment -

          Here is the exception.

          java.lang.IllegalArgumentException: Wrong FS: rawfile:/tmp/flume-rawfile/FlumeData.1358275526949.tmp, expected: file:///
                  at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590)
                  at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:69)
                  at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:464)
                  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
                  at org.apache.flume.sink.hdfs.BucketWriter.renameBucket(BucketWriter.java:459)
                  at org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:319)
                  at org.apache.flume.sink.hdfs.BucketWriter.access$400(BucketWriter.java:51)
                  at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:281)
                  at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:279)
                  at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:149)
                  at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:279)
                  at org.apache.flume.sink.hdfs.HDFSEventSink$4.call(HDFSEventSink.java:757)
                  at org.apache.flume.sink.hdfs.HDFSEventSink$4.call(HDFSEventSink.java:755)
                  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          
          Show
          Brock Noland added a comment - Here is the exception. java.lang.IllegalArgumentException: Wrong FS: rawfile:/tmp/flume-rawfile/FlumeData.1358275526949.tmp, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:69) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:464) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332) at org.apache.flume.sink.hdfs.BucketWriter.renameBucket(BucketWriter.java:459) at org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:319) at org.apache.flume.sink.hdfs.BucketWriter.access$400(BucketWriter.java:51) at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:281) at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:279) at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:149) at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:279) at org.apache.flume.sink.hdfs.HDFSEventSink$4.call(HDFSEventSink.java:757) at org.apache.flume.sink.hdfs.HDFSEventSink$4.call(HDFSEventSink.java:755) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
          Hide
          Brock Noland added a comment -

          Shoot, it doesn't look like that will work, exception below. Perhaps we should modify the HDFSEventSink to have a flag which if set and the URL is set to file:/// then it calls getRawLocalFile() on the file system object.

          Show
          Brock Noland added a comment - Shoot, it doesn't look like that will work, exception below. Perhaps we should modify the HDFSEventSink to have a flag which if set and the URL is set to file:/// then it calls getRawLocalFile() on the file system object.
          Hide
          Brock Noland added a comment -

          I wonder actually if you could use the raw local file system by setting:

          <property>
          <name>fs.rawfile.impl</name>
          <value>org.apache.hadoop.fs.RawLocalFileSystem</value>
          </property>

          and then giving HDFS the URL rawfile:///

          Show
          Brock Noland added a comment - I wonder actually if you could use the raw local file system by setting: <property> <name>fs.rawfile.impl</name> <value>org.apache.hadoop.fs.RawLocalFileSystem</value> </property> and then giving HDFS the URL rawfile:///
          Hide
          Brock Noland added a comment -

          Looks like this might be achieved by setting fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem in the core-site.xml but I have not tested.

          Show
          Brock Noland added a comment - Looks like this might be achieved by setting fs.file.impl=org.apache.hadoop.fs.RawLocalFileSystem in the core-site.xml but I have not tested.

            People

            • Assignee:
              Brock Noland
              Reporter:
              Brock Noland
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development