[SPARK-28242] DataStreamer keeps logging errors even after fixing writeStream output sink - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.4.3
Fix Version/s: None
Component/s: Structured Streaming
Labels:
- bulk-closed
Environment:

Hadoop 2.8.4

Description

I have been testing what happens to a running structured streaming that is writing to HDFS when all datanodes are down/stopped or all cluster is down (including namenode)

So I created a structured stream from kafka to a File output sink to HDFS and tested some scenarios.

We used a very simple streamings:

spark.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "kafka.server:9092...")
.option("subscribe", "test_topic")
.load()
.select(col("value").cast(DataTypes.StringType))
.writeStream()
.format("text")
.option("path", "HDFS/PATH")
.option("checkpointLocation", "checkpointPath")
.start()
.awaitTermination();

After stopping all the datanodes the process starts logging the error that datanodes are bad.

That's correct...

2019-07-03 15:55:00 [spark-listener-group-eventLog] ERROR org.apache.spark.scheduler.AsyncEventQueue:91 - Listener EventLoggingListener threw an exception java.io.IOException: All datanodes [DatanodeInfoWithStorage[10.2.12.202:50010,DS-d2fba01b-28eb-4fe4-baaa-4072102a2172,DISK]] are bad. Aborting... at org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1530) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1465) at org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1237) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:657)

The problem is that even after starting again the datanodes the process keeps logging the same error all the time.

We checked and the WriteStream to HDFS recovered successfully after starting the datanodes and the output sink worked again without problems.

I have been trying some different HDFS configurations to be sure it's not a client config related problem but with no clue about how to fix it.

It seams that something is stuck indefinitely in an error loop.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Miquel Canes

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 03/Jul/19 14:16

Updated:: 17/Nov/21 02:35

Resolved:: 25/May/21 01:40