Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2918

TaildirSource is underperforming with huge parent directories

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.7.0
    • Sinks+Sources
    • introduced an option in flume configuration for TAILDIR source to cache pattern matched files for huge directories
    • Patch

    Description

      TailDir source cause high cpu utilization, when large amount of file is sitting in the target directory. File pattern matches only a single file, but the parent directory contains about 50,000 other file.

      Attachments

        1. FLUME-2918-2.patch
          31 kB
          Attila Simon
        2. PerfHugeDir.java
          6 kB
          Attila Simon
        3. perftest.png
          311 kB
          Attila Simon
        4. profiling_after.png
          183 kB
          Attila Simon
        5. profiling_before.png
          515 kB
          Attila Simon
        6. test.csv
          18 kB
          Attila Simon

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sati Attila Simon
            sati Attila Simon
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment