Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2118

Occasional multi-hour pauses in file channel replay

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.5.0
    • Fix Version/s: None
    • Component/s: File Channel
    • Labels:
      None

      Description

      Sometimes during replay, immediately after an EOF of one log, the replay will pause for a long time.

      Here are two samples from this morning when we restarted our 3 aggregators and 2 of them hit this issue.

      02 7 2013 03:06:30,089 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2200000 records
      02 7 2013 03:06:30,179 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2210000 records
      02 7 2013 03:06:30,241 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.next:505) - Encountered EOF at 1623195625 in /data2/flume-data/log-1184
      02 7 2013 06:23:27,629 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2220000 records
      02 7 2013 06:23:28,641 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2230000 records
      02 7 2013 06:23:29,162 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2240000 records
      02 7 2013 06:23:30,118 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2250000 records
      02 7 2013 06:23:30,750 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2260000 records

      02 7 2013 08:03:00,942 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2160000 records
      02 7 2013 08:03:01,055 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2170000 records
      02 7 2013 08:03:01,168 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2180000 records
      02 7 2013 08:03:01,181 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.LogFile$SequentialReader.next:505) - Encountered EOF at 1623195640 in /data2/flume-data/log-1182
      02 7 2013 14:45:55,302 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2190000 records
      02 7 2013 14:45:56,282 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2200000 records
      02 7 2013 14:45:57,084 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2210000 records
      02 7 2013 14:45:59,043 INFO [lifecycleSupervisor-1-0] (org.apache.flume.channel.file.ReplayHandler.replayLog:292) - Read 2220000 records

      I've tried for an hour and some to track down the cause of this. There's nothing suspicious turning up on ganglia, and a cursory review of the code didn't turn up anything overly suspicious. Owing to time limitations I can't dig further at this time.

      We run a version of flume from somewhat before the current 1.4 release candidate(hash is eefefa941a60c0982f0957804be0cafb4d83e46e) there doesn't appear to be any replay patches since then.

        Attachments

        1. gc-flume.log.20130702
          1.48 MB
          Juhani Connolly
        2. flume-log
          187 kB
          Juhani Connolly
        3. flume-thread-dump
          16 kB
          Juhani Connolly

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                juhanic Juhani Connolly
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: