Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-12889

Job keeps in FAILING state

Agile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      There is a topology of 3 operator, such as, source, parser, and persist. Occasionally, 5 subtasks of the source encounters exception and turns to failed, at the same time, one subtask of the parser runs into exception and turns to failed too. The jobmaster gets a message of the parser's failed. The jobmaster then try to cancel all the subtask, most of the subtasks of the three operator turns to canceled except the 5 subtasks of the source, because the state of the 5 ones is already FAILED before jobmaster try to cancel it. Then the jobmaster can not reach a final state but keeps in  Failing state meanwhile the subtask of the source kees in canceling state. 
       
      The job run on a flink 1.7 cluster on yarn, and there is only one tm with 10 slots.
       
      The attached files contains a jm log , tm log and the ui picture.
       
      The exception timestamp is about 2019-06-16 13:42:28.

      Attachments

        1. taskmanager.log
          7.67 MB
          Fan Xinpu
        2. jobmanager.log.2019-06-16.0
          734 kB
          Fan Xinpu
        3. 20190618104945417.jpg
          65 kB
          Fan Xinpu

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            trohrmann Till Rohrmann
            xinpu Fan Xinpu
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 0.5h
              0.5h

              Slack

                Issue deployment