Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-15924

Detect and log blocking main thread operations

    XMLWordPrintableJSON

Details

    Description

      When using the RpcEndpoint it is important that all operations which run on the main thread are never blocking. We have seen in the past that it is quite hard to always catch blocking operations in reviews and sometimes these changes caused instabilities in Flink. Once this happens it is not trivial to find the culprit which is responsible for the blocking operation.

      One way to make debugging easier is to add a monitor which detects and logs if a RpcEndpoint operation takes longer than n seconds for example. Depending on the overhead of this monitor one could even think about enabling it only via a special configuration (e.g. debug mode).

      A proper class to introduce this monitor could be the AkkaRpcActor which is responsible for executing main thread operations. Whenever we schedule an operation, we could start a timeout which if triggered and the operation has not been completed will log a warning.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              trohrmann Till Rohrmann
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: