Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-28024

KeyedStateCheckpointingITCase.KeyedStateCheckpointingITCase ends up in infinite failover loop

    XMLWordPrintableJSON

Details

    Description

      We observed several situations already where log files reached a file size of over 120G. This caused the worker's disk usage to reach 100% resulting in the worker machine to go "offline", i.e. not being available to pick up new tasks.

      The initially observed excessive log spilling is due to a TaskManager failing fatally which results in the requested number of slots never becoming available and the test job ending up in an infinite failover/restart loop. See further details in the comment section.

      Attachments

        Issue Links

          Activity

            People

              mapohl Matthias Pohl
              mapohl Matthias Pohl
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: