Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-29708

Enrich Flink Kubernetes Operator CRD error field

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Problem Statement:

      FlinkDeployment and FlinkSessionJob CRD has a CommonStatus error field of String type. Currently, this field stores various errors such as:

      • CR validation error
      • Missing SessionJob error/ Missing JobManager deployment error
      • Unknown Job error
      • DeploymentFailedException
      • ReconciliationError such as RestClientException from Flink Internal such as FlinkRest and FlinkRuntime

      It is insufficient to store each error simply as string only. We need to include some exception metadata to help operator handle this error accordingly. For example, it is very useful to know the HttpResponseStatus code from RestClientException.

      Proposed Solution:

      • The error field should store a JSON with exception metadata. For example:
      {    
        "type": "JobManagerNotFoundException",    
        "message": "JobManager with leadership ID: 1234 was not found",    
        "stackTrace": "JobManager lost connection at ....",
        "additionalMetadata": {     
          "httpResponseCode": "400"
        },
        "throwableList": [
          {
            "type": "FlinkRuntimeException",
            "message": "other exception"
          },
          ....
        ]
      } 
      • The stackTrace field can be enabled or disabled via spec change.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            darenwkt Daren Wong
            darenwkt Daren Wong
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment