Flume
  1. Flume
  2. FLUME-1856

HDFS sink idleTimeout does not close removed BucketWriters

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: v1.4.0, v1.3.1
    • Fix Version/s: None
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      When a BucketWriter idles out, it is removed from the bucket list but it is never closed. Optimal solution will be to override LinkedHashMap.remove similar to the removeEldestEntry code to close the bucket.

      The effects of this not happening is that if you rely on idleTimeout for closing buckets (such as setting no roll period and attempting to roll via timestamped path), as the buckets never get closed the data never goes to s3 and so the heap gets very full.

        Activity

        Hide
        Gopinathan A added a comment -

        No need of code changes i feel, mentioning in user guide is sufficient.

        Please review the attached patch.

        Show
        Gopinathan A added a comment - No need of code changes i feel, mentioning in user guide is sufficient. Please review the attached patch.
        Hide
        Hari Shreedharan added a comment -

        This should be fixed, can you verify?

        Show
        Hari Shreedharan added a comment - This should be fixed, can you verify?
        Hide
        Gopinathan A added a comment -

        Updated flume user guide.

        Show
        Gopinathan A added a comment - Updated flume user guide.
        Hide
        Connor Woodson added a comment -

        Alright well ignore this...doing some digging I realized I was supposed to specify the timeout in seconds, not ms, and that the bucket will close itself (10 hours is a tad too long of a timeout...). But that leads to a following JIRA to put some specificity into the User guide.

        https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java#L358

        Show
        Connor Woodson added a comment - Alright well ignore this...doing some digging I realized I was supposed to specify the timeout in seconds, not ms, and that the bucket will close itself (10 hours is a tad too long of a timeout...). But that leads to a following JIRA to put some specificity into the User guide. https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java#L358

          People

          • Assignee:
            Gopinathan A
            Reporter:
            Connor Woodson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development