Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-3320

Taildir source reset position bug

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.7.0, 1.8.0
    • None
    • Master, Sinks+Sources
    • None

    Description

      1) background

      we use flume to transport logs from local file to kafka

      every night at 0 p.m , local file will be moved to another directory to be compressed as backup , after that , old file's inode will be released. The monitor directory will generate new log file with the same filename.Since flume monitor files with inode , never seen any conflict before.

      2 ) problem 

      2019-02-23 23:59:34,157 (Log-BackgroundWorker-c1) [INFO - org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint(EventQueueBackingStoreFile.java:230)] Start checkpoint for /data/flume/filechannel/checkpoint/checkpoint, elements to sync = 3406
      2019-02-23 23:59:34,167 (Log-BackgroundWorker-c1) [INFO - org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint(EventQueueBackingStoreFile.java:255)] Updating checkpoint metadata: logWriteOrderID: 1550778307223, queueSize: 0, queueHead: 1754854
      2019-02-23 23:59:34,176 (Log-BackgroundWorker-c1) [INFO - org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1065)] Updated checkpoint for file: /data/flume/filechannel/data/log-68 position: 149727224 logWriteOrderID: 1550778307223
      2019-02-24 00:00:01,550 (PollableSourceRunner-TaildirSource-r1) [INFO - org.apache.flume.source.taildir.ReliableTaildirEventReader.updateTailFiles(ReliableTaildirEventReader.java:257)] Pos 15927858851 is larger than file size! Restarting from pos 0, file: /data/logs/nginx/stm-collect.cn.miaozhen.com.log, inode: 3932190
      2019-02-24 00:00:01,550 (PollableSourceRunner-TaildirSource-r1) [INFO - org.apache.flume.source.taildir.TailFile.updatePos(TailFile.java:126)] Updated position, file: /data/logs/nginx/stm-collect.cn.miaozhen.com.log, inode: 3932190, pos: 0
      2019-02-24 00:00:04,177 (Log-BackgroundWorker-c1) [INFO - org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint(EventQueueBackingStoreFile.java:230)] Start checkpoint for /data/flume/filechannel/checkpoint/checkpoint, elements to sync = 33326
      2019-02-24 00:00:04,189 (Log-BackgroundWorker-c1) [INFO - org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint(EventQueueBackingStoreFile.java:255)] Updating checkpoint metadata: logWriteOrderID: 1550778350086, queueSize: 24452, queueHead: 1763726
      2019-02-24 00:00:04,213 (Log-BackgroundWorker-c1) [INFO - org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1065)] Updated checkpoint for file: /data/flume/filechannel/data/log-68 position: 190948775 logWriteOrderID: 1550778350086

       normally,flume will open the new file and get it's inode. if inode in Map<inode,taildir> ,and taildir position larger than file size. the position will be reset to 0.The system went well for a long time.
      but in this time,flume failed to get the new file's inode.In the log above ,you can see there was no 'Opening file ...' . And somehow the old log file's position was reset and all the data in 23,Feb sent double.

      3)question
      Logically , flume relates the real file and object tailfile with inode ,even though I get the old file's inode,I still can not figure out why the tailfile.getPos() will be larger than f.length().
      has this happened before? should i blame this to java NIO?

      Attachments

        1. error.log
          83 kB
          zzf

        Activity

          People

            Unassigned Unassigned
            archer4z zzf
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 20m
                1h 20m