Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2918

TaildirSource is underperforming with huge parent directories

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.7.0
    • Sinks+Sources
    • introduced an option in flume configuration for TAILDIR source to cache pattern matched files for huge directories
    • Patch

    Description

      TailDir source cause high cpu utilization, when large amount of file is sitting in the target directory. File pattern matches only a single file, but the parent directory contains about 50,000 other file.

      Attachments

        1. test.csv
          18 kB
          Attila Simon
        2. profiling_before.png
          515 kB
          Attila Simon
        3. profiling_after.png
          183 kB
          Attila Simon
        4. perftest.png
          311 kB
          Attila Simon
        5. PerfHugeDir.java
          6 kB
          Attila Simon
        6. FLUME-2918-2.patch
          31 kB
          Attila Simon

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sati Attila Simon
            sati Attila Simon
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment