Flume
  1. Flume
  2. FLUME-1916

HDFS sink should poll for # of active replicas. If less than required, roll the file.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: v1.3.1
    • Fix Version/s: v1.4.0
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      Add functionality to the HDFS sink which constantly polls the number of replicas for files being written. If the number of replicas drops below 3 (or specified number), it should immediately close the HDFS output file and start a new file that should be able to create a pipeline with the correct number of replicas.

      This is the same behavior taken by HBase for its write-ahead log (WAL) and should help us avoid hitting the more complex corner cases around failed close() calls.

      1. FLUME-1916.patch
        25 kB
        Mike Percy
      2. FLUME-1916-1.patch
        25 kB
        Mike Percy
      3. FLUME-1916-2.patch
        25 kB
        Mike Percy
      4. FLUME-1916-3.patch
        25 kB
        Mike Percy
      5. FLUME-1916-4.patch
        25 kB
        Mike Percy

        Issue Links

          Activity

          Hide
          Mike Percy added a comment -

          See HBASE-2234 for the HBase version of this.

          Show
          Mike Percy added a comment - See HBASE-2234 for the HBase version of this.
          Hide
          Mike Percy added a comment -

          Added a patch that accomplishes this. The test is racy so we check for either 4 or 5 files, since we don't know whether the client notices that we killed a DN until it calls sync(). Sometimes it notices before that, sometimes it notices at that time.

          Also added a param to allow for manually specifying the minimum replication factor we want to allow before a file roll. The default is the default replication factor.

          The nasty reflection stuff was directly purloined from HBASE-2234.

          Show
          Mike Percy added a comment - Added a patch that accomplishes this. The test is racy so we check for either 4 or 5 files, since we don't know whether the client notices that we killed a DN until it calls sync(). Sometimes it notices before that, sometimes it notices at that time. Also added a param to allow for manually specifying the minimum replication factor we want to allow before a file roll. The default is the default replication factor. The nasty reflection stuff was directly purloined from HBASE-2234 .
          Hide
          Mike Percy added a comment -

          Rebased patch.

          Show
          Mike Percy added a comment - Rebased patch.
          Hide
          Hari Shreedharan added a comment -

          +1.

          Show
          Hari Shreedharan added a comment - +1.
          Hide
          Hari Shreedharan added a comment -

          Patch committed, rev: 17338bf303e617054576813b02d057b98753b6aa. Thanks Mike!

          Show
          Hari Shreedharan added a comment - Patch committed, rev: 17338bf303e617054576813b02d057b98753b6aa. Thanks Mike!
          Hide
          Hudson added a comment -

          Integrated in flume-trunk #366 (See https://builds.apache.org/job/flume-trunk/366/)
          FLUME-1916. HDFS sink should poll for # of active replicas. If less than required, roll the file. (Revision 17338bf303e617054576813b02d057b98753b6aa)

          Result = SUCCESS
          hshreedharan : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=17338bf303e617054576813b02d057b98753b6aa
          Files :

          • flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/AbstractHDFSWriter.java
          • flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSCompressedDataStream.java
          • flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java
          • flume-ng-doc/sphinx/FlumeUserGuide.rst
          • flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSinkOnMiniCluster.java
          • flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java
          • flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java
          • flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockHDFSWriter.java
          • flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSDataStream.java
          • flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWriter.java
          Show
          Hudson added a comment - Integrated in flume-trunk #366 (See https://builds.apache.org/job/flume-trunk/366/ ) FLUME-1916 . HDFS sink should poll for # of active replicas. If less than required, roll the file. (Revision 17338bf303e617054576813b02d057b98753b6aa) Result = SUCCESS hshreedharan : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=17338bf303e617054576813b02d057b98753b6aa Files : flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/AbstractHDFSWriter.java flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSCompressedDataStream.java flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java flume-ng-doc/sphinx/FlumeUserGuide.rst flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSinkOnMiniCluster.java flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockHDFSWriter.java flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSDataStream.java flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWriter.java

            People

            • Assignee:
              Mike Percy
              Reporter:
              Mike Percy
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development