Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32127

Source busy time is inaccurate in many cases

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      We found that source busy time is inaccurate in many cases. The reason is that sources are usu. multi-threaded (Kafka and RocketMq for example), there is a fetcher thread fetching data from data source, and a consumer thread deserializes data with an blocking queue in between. A source is considered 

      1. idle if the consumer is blocked by fetching data from the queue
      2. backpressured if the consumer is blocked by writing data to downstream operators
      3. busy otherwise

      However, this means that if the bottleneck is on the fetcher side, the consumer will be often blocked by fetching data from the queue, the source idle time would be high, but in fact it is busy and consumes a lot of CPU. In some of our jobs, the source max busy time is only ~600 ms while it has actually reached the limit.

      The bottleneck could be on the fetcher side, for example, when Kafka enables zstd compression, uncompression on the consumer side could be quite heavy compared to data deserialization on the consumer thread side.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mxm Maximilian Michels
            Zhanghao Chen Zhanghao Chen
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment