Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10949

When use flink-1.6.2's intervalJoin funtion, the thread is stucked in rockdb's seek for too long time

    XMLWordPrintableJSON

    Details

      Description

      0down vote favorite
       
      I am using IntervalJoin function to join two streams within 10 minutes. As below:

       
      labelStream.intervalJoin(adLogStream)
                               .between(Time.milliseconds(0), Time.milliseconds(600000)) 
                 .process(new processFunction())
                 .sink(kafkaProducer)
      labelStream and adLogStream are proto-buf class that are keyed by Long id.

      Our two input-streams are huge. After running about 30minutes, the output to kafka go down slowly, like this: 

      When data output begins going down, I use jstack and pstack sevaral times to get these: 

      It seems the program is stucked in rockdb's seek. And I find that some rockdb's srt file are accessed slowly by iteration. 

      I have tried several ways:

      1)Reduce the input amount to half. This works well.

      2)Replace labelStream and adLogStream with simple Strings. This way, data amount will not change. This works well.}}

      3)Use PredefinedOptions like SPINNING_DISK_OPTIMIZED and SPINNING_DISK_OPTIMIZED_HIGH_MEM. This still fails.}}

      4)Use new versions of rocksdbjni. This still fails.}}

      Can anyone give me some suggestions? Thank you very much.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Jiangang Liu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: