Details
-
Bug
-
Status: Closed
-
Not a Priority
-
Resolution: Information Provided
-
1.6.2
-
None
-
flink1.6.2, linux
Description
0down vote favorite
I am using IntervalJoin function to join two streams within 10 minutes. As below:
labelStream.intervalJoin(adLogStream)
.between(Time.milliseconds(0), Time.milliseconds(600000))
.process(new processFunction())
.sink(kafkaProducer)
labelStream and adLogStream are proto-buf class that are keyed by Long id.
Our two input-streams are huge. After running about 30minutes, the output to kafka go down slowly, like this:
When data output begins going down, I use jstack and pstack sevaral times to get these:
It seems the program is stucked in rockdb's seek. And I find that some rockdb's srt file are accessed slowly by iteration.
I have tried several ways:
1)Reduce the input amount to half. This works well.
2)Replace labelStream and adLogStream with simple Strings. This way, data amount will not change. This works well.}}
3)Use PredefinedOptions like SPINNING_DISK_OPTIMIZED and SPINNING_DISK_OPTIMIZED_HIGH_MEM. This still fails.}}
4)Use new versions of rocksdbjni. This still fails.}}
Can anyone give me some suggestions? Thank you very much.