Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.6.0
-
None
-
None
-
cdh 5.16.2
Sink configuration
2hdfs.sinks.scribe2hdfsprd.type = hdfs
2hdfs.sinks.scribe2hdfsprd.channel = kafka
2hdfs.sinks.scribe2hdfsprd.hdfs.path = hdfs://clname/data/logs/
2hdfs.sinks.scribe2hdfsprd.hdfs.filePrefix = logone-%Y-%m-%d-%H
2hdfs.sinks.scribe2hdfsprd.hdfs.rollSize = 268435456
2hdfs.sinks.scribe2hdfsprd.hdfs.rollInterval = 3600
2hdfs.sinks.scribe2hdfsprd.hdfs.rollCount = 0
2hdfs.sinks.scribe2hdfsprd.hdfs.idleTimeout = 300
2hdfs.sinks.scribe2hdfsprd.hdfs.batchSize = 1000
2hdfs.sinks.scribe2hdfsprd.hdfs.minBlockReplicas = 3
2hdfs.sinks.scribe2hdfsprd.hdfs.threadsPoolSize = 600
2hdfs.sinks.scribe2hdfsprd.hdfs.rollTimerPoolSize = 10
2hdfs.sinks.scribe2hdfsprd.hdfs.callTimeout = 300000
2hdfs.sinks.scribe2hdfsprd.hdfs.fileType = DataStream
2hdfs.sinks.scribe2hdfsprd.hdfs.writeFormat = Text
2hdfs.sinks.scribe2hdfsprd.hdfs.proxyUser = stat
2hdfs.sinks.scribe2hdfsprd.sink.serializer = text
2hdfs.sinks.scribe2hdfsprd.sink.serializer.appendNewline = falsecdh 5.16.2 Sink configuration 2hdfs.sinks.scribe2hdfsprd.type = hdfs 2hdfs.sinks.scribe2hdfsprd.channel = kafka 2hdfs.sinks.scribe2hdfsprd.hdfs.path = hdfs://clname/data/logs/ 2hdfs.sinks.scribe2hdfsprd.hdfs.filePrefix = logone-%Y-%m-%d-%H 2hdfs.sinks.scribe2hdfsprd.hdfs.rollSize = 268435456 2hdfs.sinks.scribe2hdfsprd.hdfs.rollInterval = 3600 2hdfs.sinks.scribe2hdfsprd.hdfs.rollCount = 0 2hdfs.sinks.scribe2hdfsprd.hdfs.idleTimeout = 300 2hdfs.sinks.scribe2hdfsprd.hdfs.batchSize = 1000 2hdfs.sinks.scribe2hdfsprd.hdfs.minBlockReplicas = 3 2hdfs.sinks.scribe2hdfsprd.hdfs.threadsPoolSize = 600 2hdfs.sinks.scribe2hdfsprd.hdfs.rollTimerPoolSize = 10 2hdfs.sinks.scribe2hdfsprd.hdfs.callTimeout = 300000 2hdfs.sinks.scribe2hdfsprd.hdfs.fileType = DataStream 2hdfs.sinks.scribe2hdfsprd.hdfs.writeFormat = Text 2hdfs.sinks.scribe2hdfsprd.hdfs.proxyUser = stat 2hdfs.sinks.scribe2hdfsprd.sink.serializer = text 2hdfs.sinks.scribe2hdfsprd.sink.serializer.appendNewline = false
Description
There seems to be a problem related to hdfs timeouts.
2020-05-03 20:50:39,819 INFO org.apache.flume.sink.hdfs.BucketWriter: Creating hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp
2020-05-03 21:01:52,208 INFO org.apache.flume.sink.hdfs.BucketWriter: Closing idle bucketWriter hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp at 1588528912208
2020-05-03 21:01:52,208 INFO org.apache.flume.sink.hdfs.HDFSEventSink: Writer callback called.
2020-05-03 21:01:52,208 INFO org.apache.flume.sink.hdfs.BucketWriter: Closing hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp
2020-05-03 21:03:07,282 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1441209260-10.30.0.19-1387400921010:blk_5841169679_1121097069614
java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.30.3.187:40062 remote=/10.30.3.187:50010]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2354)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:235)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:1093)
2020-05-03 21:03:07,283 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-1441209260-10.30.0.19-1387400921010:blk_5841169679_1121097069614 in pipeline DatanodeInfoWithStorage[10.30.3.187:50010,DS-306b4cb3-ba0d-4970-80cd-894083759467,DISK], DatanodeInfoWithStorage[10.30.3.196:50010,DS-bdc91c20-e8d0-4d43-a1b0-41cabb8306b3,DISK], DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK]: bad datanode DatanodeInfoWithStorage[10.30.3.187:50010,DS-306b4cb3-ba0d-4970-80cd-894083759467,DISK]
020-05-03 21:05:22,463 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds.
java.io.IOException: All datanodes DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739)
2020-05-03 21:05:22,474 INFO org.apache.flume.sink.hdfs.BucketWriter: Renaming hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp to hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807
2020-05-03 21:08:22,466 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds.
java.io.IOException: All datanodes DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739)
2020-05-03 21:11:22,467 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds.
java.io.IOException: All datanodes DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739)...
2020-05-10 13:14:08,742 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds.
java.io.IOException: All datanodes DatanodeInfoWithStorage[10.121.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739)
Flume infinitely tries to close the file.
First of all, the file is renamed from .tmp to it's normal name already. But still logged with tmp suffix.
rw-rr- 3 flume flume 5054814 2020-05-03 20:50 hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807
It is in open state also
$ hdfs dfsadmin -listOpenFiles|grep 5-03-20.1588528239807
10.121.3.187 DFSClient_NONMAPREDUCE_833895359_50 /data/logs/logone-2020-05-03-20.1588528239807
hdfs fsck -files -blocks -locations on this file runs without any error, but with no any block listed (open for write)
Total symlinks: 0 (Files currently being written: 1)
Total blocks (validated): 0 (Total open file blocks (not validated): 1)