Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6042

Display last n exceptions/causes for job restarts in Web UI

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Hide
      Flink exposes the exception history now through the REST API and the UI. The amount of most-recently handled exceptions that shall be tracked can be defined through `web.exception-history-size`. Some values of the exception history's REST API Json response are deprecated as part of this effort.
      Show
      Flink exposes the exception history now through the REST API and the UI. The amount of most-recently handled exceptions that shall be tracked can be defined through `web.exception-history-size`. Some values of the exception history's REST API Json response are deprecated as part of this effort.

    Description

      Users requested that it would be nice to see the last n exceptions causing a job restart in the Web UI. This will help to more easily debug and operate a job.

      We could store the root causes for failures similar to how prior executions are stored in the ExecutionVertex using the EvictingBoundedList and then serve this information via the Web UI.

      -- Update: January 21, 2021 --

      The UI can already handle multiple exceptions through the Exception History. Right now, we list one or more exceptions which caused the job to fail. Instead, we could adapt it in a way that the history contains not only the exceptions of the most recent failure but one expandable entry per restart. If there are more than one exception connected to a single restart, we would list their stacktraces within one expandable entry.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mapohl Matthias Pohl
            trohrmann Till Rohrmann
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment