[FLINK-10949] When use flink-1.6.2's intervalJoin funtion, the thread is stucked in rockdb's seek for too long time - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Not a Priority
Resolution: Information Provided
Affects Version/s: 1.6.2
Fix Version/s: None
Component/s: Runtime / State Backends
Labels:
Environment:

flink1.6.2, linux

Description

0down vote favorite

I am using IntervalJoin function to join two streams within 10 minutes. As below:

labelStream.intervalJoin(adLogStream)
.between(Time.milliseconds(0), Time.milliseconds(600000))
.process(new processFunction())
.sink(kafkaProducer)
labelStream and adLogStream are proto-buf class that are keyed by Long id.

Our two input-streams are huge. After running about 30minutes, the output to kafka go down slowly, like this:

When data output begins going down, I use jstack and pstack sevaral times to get these:

It seems the program is stucked in rockdb's seek. And I find that some rockdb's srt file are accessed slowly by iteration.

I have tried several ways:

1)Reduce the input amount to half. This works well.

2)Replace labelStream and adLogStream with simple Strings. This way, data amount will not change. This works well.}}

3)Use PredefinedOptions like SPINNING_DISK_OPTIMIZED and SPINNING_DISK_OPTIMIZED_HIGH_MEM. This still fails.}}

4)Use new versions of rocksdbjni. This still fails.}}

Can anyone give me some suggestions? Thank you very much.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Liu

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Nov/18 13:26

Updated:: 08/Dec/21 08:41

Resolved:: 08/Dec/21 08:41