Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-1232

Cannot start agent a 3rd time when using FileChannel

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.2.0
    • 1.2.0
    • Channel
    • None
    • RHEL 5.6 64-bit

    Description

      Steps:
      1) Start clean by wiping-out FileChannel's existing checkpoint dir and data dir
      2) Configure the agent to use filechannel (type = FILE). THe config file I used is at the end of this text.
      3) Start the agent, confirm lock files exist in data and checkpoint dirs, stop agent, confirm lock files are remove from data and checkpoint dirs.
      4) Repeat step 3
      5) Start the agent. The following exceptions are shown in the logs:

      2012-05-29 03:15:36,453 DEBUG file.ReplayHandler: record.getTimestamp() = 1338286275813, lastCheckpoint = 1338286279596, fileId = 1, offset = 1924, type = Commit, transaction 1338285619250
      2012-05-29 03:15:36,453 DEBUG file.ReplayHandler: record.getTimestamp() = 1338286279783, lastCheckpoint = 1338286279596, fileId = 1, offset = 1949, type = Take, transaction 1338285619251
      2012-05-29 03:15:36,453 DEBUG file.ReplayHandler: record.getTimestamp() = 1338286279784, lastCheckpoint = 1338286279596, fileId = 1, offset = 1980, type = Commit, transaction 1338285619251
      2012-05-29 03:15:36,453 DEBUG file.ReplayHandler: Processing commit of Take
      2012-05-29 03:15:36,453 INFO file.ReplayHandler: Replayed 1 from /var/run/flume-ng/.flume/file-channel/data/log-1
      2012-05-29 03:15:36,453 INFO file.ReplayHandler: Replaying /var/run/flume-ng/.flume/file-channel/data/log-2
      2012-05-29 03:15:36,454 DEBUG file.ReplayHandler: record.getTimestamp() = 1338286370280, lastCheckpoint = 1338286279596, fileId = 2, offset = 8, type = Take, transaction 1338286369982
      2012-05-29 03:15:36,454 DEBUG file.ReplayHandler: record.getTimestamp() = 1338286370287, lastCheckpoint = 1338286279596, fileId = 2, offset = 39, type = Commit, transaction 1338286369982
      2012-05-29 03:15:36,454 DEBUG file.ReplayHandler: Processing commit of Take
      2012-05-29 03:15:36,454 INFO file.ReplayHandler: Unable to remove FlumeEventPointer [fileID=1, offset=1853] added to pending list
      2012-05-29 03:15:36,454 INFO file.ReplayHandler: Replayed 1 from /var/run/flume-ng/.flume/file-channel/data/log-2
      2012-05-29 03:15:36,454 DEBUG file.ReplayHandler: Pending take FlumeEventPointer [fileID=1, offset=1853]
      2012-05-29 03:15:36,455 ERROR file.Log: Failed to initialize Log
      java.lang.IllegalStateException: Pending takes 1 exist after the end of replay
      at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
      at org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:137)
      at org.apache.flume.channel.file.Log.replay(Log.java:205)
      at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:180)
      at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:228)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)
      2012-05-29 03:15:36,457 ERROR lifecycle.LifecycleSupervisor: Unable to start org.apache.flume.channel.file.FileChannel@1ac88440 - Exception follows.
      java.lang.IllegalStateException: Log is closed
      at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
      at org.apache.flume.channel.file.Log.getFlumeEventQueue(Log.java:226)
      at org.apache.flume.channel.file.FileChannel.getDepth(FileChannel.java:253)
      at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:187)
      at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:228)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)
      2012-05-29 03:15:36,458 ERROR flume.SinkRunner: Unhandled exception, logging and sleeping for 5000ms
      java.lang.IllegalStateException: Channel closed
      at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
      at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:237)
      at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:118)
      at org.apache.flume.sink.LoggerSink.process(LoggerSink.java:61)
      at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
      at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
      at java.lang.Thread.run(Thread.java:662)
      2012-05-29 03:15:39,460 INFO file.FileChannel: Starting FileChannel with dataDir [/var/run/flume-ng/.flume/file-channel/data]
      2012-05-29 03:15:39,460 INFO file.Log: Cannot lock /var/run/flume-ng/.flume/file-channel/checkpoint. The directory is already locked.
      2012-05-29 03:15:39,461 ERROR lifecycle.LifecycleSupervisor: Unable to start org.apache.flume.channel.file.FileChannel@1ac88440 - Exception follows.
      java.lang.RuntimeException: java.io.IOException: Cannot lock /var/run/flume-ng/.flume/file-channel/checkpoint. The directory is already locked.
      at com.google.common.base.Throwables.propagate(Throwables.java:156)
      at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:182)
      at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:228)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)

      Config file I used (flume.conf):

      agent.channels = c1
      agent.sources = r1
      agent.sinks = k1
      #
      agent.channels.c1.type = FILE
      #
      agent.sources.r1.channels = c1
      agent.sources.r1.type = NETCAT
      agent.sources.r1.bind = 0.0.0.0
      agent.sources.r1.port = 41414
      #
      agent.sinks.k1.channel = c1
      agent.sinks.k1.type = LOGGER

      Attachments

        1. FLUME-1232-2.patch
          71 kB
          Arvind Prabhakar
        2. FLUME-1232-3.patch
          71 kB
          Arvind Prabhakar
        3. FLUME-1232-4.patch
          72 kB
          Arvind Prabhakar

        Issue Links

          Activity

            People

              aprabhakar Arvind Prabhakar
              will@cloudera.com Will McQueen
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: