Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-3366

hdfs sink can't close (and rename?) file after hdfs.DFSClien timeout

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.6.0
    • None
    • Sinks+Sources
    • None

    Description

      There seems to be a problem related to hdfs timeouts.

      2020-05-03 20:50:39,819 INFO org.apache.flume.sink.hdfs.BucketWriter: Creating hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp
      2020-05-03 21:01:52,208 INFO org.apache.flume.sink.hdfs.BucketWriter: Closing idle bucketWriter hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp at 1588528912208
      2020-05-03 21:01:52,208 INFO org.apache.flume.sink.hdfs.HDFSEventSink: Writer callback called.
      2020-05-03 21:01:52,208 INFO org.apache.flume.sink.hdfs.BucketWriter: Closing hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp
      2020-05-03 21:03:07,282 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1441209260-10.30.0.19-1387400921010:blk_5841169679_1121097069614
      java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.30.3.187:40062 remote=/10.30.3.187:50010]
      at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
      at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
      at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
      at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
      at java.io.FilterInputStream.read(FilterInputStream.java:83)
      at java.io.FilterInputStream.read(FilterInputStream.java:83)
      at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2354)
      at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:235)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:1093)
      2020-05-03 21:03:07,283 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-1441209260-10.30.0.19-1387400921010:blk_5841169679_1121097069614 in pipeline DatanodeInfoWithStorage[10.30.3.187:50010,DS-306b4cb3-ba0d-4970-80cd-894083759467,DISK], DatanodeInfoWithStorage[10.30.3.196:50010,DS-bdc91c20-e8d0-4d43-a1b0-41cabb8306b3,DISK], DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK]: bad datanode DatanodeInfoWithStorage[10.30.3.187:50010,DS-306b4cb3-ba0d-4970-80cd-894083759467,DISK]
      020-05-03 21:05:22,463 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds.
      java.io.IOException: All datanodes DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting...
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739)
      2020-05-03 21:05:22,474 INFO org.apache.flume.sink.hdfs.BucketWriter: Renaming hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp to hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807
      2020-05-03 21:08:22,466 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds.
      java.io.IOException: All datanodes DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting...
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739)
      2020-05-03 21:11:22,467 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds.
      java.io.IOException: All datanodes DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting...
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739)

      ...

      2020-05-10 13:14:08,742 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds.
      java.io.IOException: All datanodes DatanodeInfoWithStorage[10.121.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting...
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739)

       

      Flume infinitely tries to close the file.

      First of all, the file is renamed from .tmp to it's normal name already. But still logged with tmp suffix.

      rw-rr- 3 flume flume 5054814 2020-05-03 20:50 hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807

      It is in open state also

      $ hdfs dfsadmin -listOpenFiles|grep 5-03-20.1588528239807
      10.121.3.187 DFSClient_NONMAPREDUCE_833895359_50 /data/logs/logone-2020-05-03-20.1588528239807

      hdfs fsck -files -blocks -locations on this file runs without any error, but with no any block listed (open for write)

       Total symlinks:                0 (Files currently being written: 1)

       Total blocks (validated):      0 (Total open file blocks (not validated): 1)

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            amuhametov Anes Mukhametov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: