Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-36140

Log a warning when pods are terminated by kubernetes

    XMLWordPrintableJSON

Details

    Description

      Scheduled maintenance or buggy nodes on Kubernetes can result random pod termination and eventually a series of job restarts due to rolling restart of the Kubernetes cluster nodes. The larger the job is the higher the chance it is affected. The jobs should be able to auto-recover from these issues, but can cause unwanted turbulence in large scale pipeline.

      In this case, it is very difficult to identify what is causing the restarts without knowing the issue at Kubernetes layer and the keyword to search with because it is logged at INFO level.

      We need to log this at higher level. If changing it from INFO to ERROR breaks monitoring we should at least log as warning. 

      Attachments

        Issue Links

          Activity

            People

              claraxiong Clara Xiong
              claraxiong Clara Xiong
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: