Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-27358

Kubernetes operator throws NPE when testing with Flink 1.15

    XMLWordPrintableJSON

Details

    Description

      2022-04-22 10:19:18,307 o.a.f.k.o.c.FlinkDeploymentController [WARN ][default/flink-example-statemachine] Attempt count: 5, last attempt: true
      2022-04-22 10:19:18,329 i.j.o.p.e.ReconciliationDispatcher [ERROR][default/flink-example-statemachine] Error during event processing ExecutionScope{ resource id: CustomResourceID{name='flink-example-statemachine', namespace='default'}, version: 4979543} failed.
      org.apache.flink.kubernetes.operator.exception.ReconciliationException: java.lang.NullPointerException
          at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:110)
          at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:53)
          at io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:101)
          at io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:76)
          at io.javaoperatorsdk.operator.api.monitoring.Metrics.timeControllerExecution(Metrics.java:34)
          at io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:75)
          at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:143)
          at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:109)
          at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:74)
          at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:50)
          at io.javaoperatorsdk.operator.processing.event.EventProcessor$ControllerExecution.run(EventProcessor.java:349)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
          at java.base/java.lang.Thread.run(Unknown Source)
      Caused by: java.lang.NullPointerException
          at org.apache.flink.kubernetes.operator.utils.FlinkUtils.lambda$deleteJobGraphInKubernetesHA$0(FlinkUtils.java:253)
          at java.base/java.util.ArrayList.forEach(Unknown Source)
          at org.apache.flink.kubernetes.operator.utils.FlinkUtils.deleteJobGraphInKubernetesHA(FlinkUtils.java:248)
          at org.apache.flink.kubernetes.operator.service.FlinkService.submitApplicationCluster(FlinkService.java:130)
          at org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler.deployFlinkJob(ApplicationReconciler.java:205)
          at org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler.restoreFromLastSavepoint(ApplicationReconciler.java:218)
          at org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler.reconcile(ApplicationReconciler.java:117)
          at org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler.reconcile(ApplicationReconciler.java:56)
          at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:106)
          ... 13 more 

      The root cause is that the Kubernetes HA implementation has changed from 1.15. When the job is cancelled, the data of leader ConfigMap will be cleared. 

      Attachments

        Issue Links

          Activity

            People

              wangyang0918 Yang Wang
              wangyang0918 Yang Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: