Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-3334

TaildirSource tailFiles Map causing OOM when huge amount of files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.7.0, 1.8.0, 1.9.0
    • None
    • Sinks+Sources
    • None

    Description

      I am using taildir source to monitor a log dir, about 100 new files per seconds, I set -xmx 2048m for flume, after 2 hours running, I get OOM error with "Failed writing positionFile".

      With a deap dive to heap dump file, i can see tailFiles occupies 1.7G memory, so I looked into the source code find that flume remember every file that match the file pattern in tailFiles, so can you add a property to filter file last modify time, default can be infinity, for example 30min, if the file modify time is 30min ago then remove it from tailFiles and do not monitor it.

      My logs come from real time transcation system and one file per transaction, file name is trace number, usually a transcation should be completed in several seconds, so most of the time there is no more update on the file, for some exception flume just read whole file and we can deal with it specially too.

      Please consider this scenario, thanks

      Attachments

        1. 20190511173448.png
          15 kB
          ZhengHanyang
        2. 20190511173521.png
          76 kB
          ZhengHanyang

        Activity

          People

            Unassigned Unassigned
            ZhengHanyang ZhengHanyang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: