Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9385

Connect cluster: connector task repeat like a splitbrain cluster problem

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: KafkaConnect
    • Labels:
      None

      Description

      I am using Debezium. And find a task repeat problem.[Jump|https://issues.redhat.com/browse/DBZ-1573?jql=key%20in%20watchedIssues()]

       

      1. I push the Debezium image to our private image repository.

      2. Deploy the connect cluster with the following Deployment Config

      //代码占位符
      apiVersion: apps.openshift.io/v1
      kind: DeploymentConfig
      metadata:
        annotations:
          openshift.io/generated-by: OpenShiftWebConsole
        creationTimestamp: '2019-10-14T07:45:41Z'
        generation: 29
        labels:
          app: debezium-test-cloud
        name: debezium-test-cloud
        namespace: test
        resourceVersion: '168496156'
        selfLink: >-
          /apis/apps.openshift.io/v1/namespaces/test/deploymentconfigs/debezium-test-cloud
        uid: 9f4f8f4d-ee56-11e9-a5a1-00163e0e008f
      spec:
        replicas: 2
        selector:
          app: debezium-test-cloud
          deploymentconfig: debezium-test-cloud
        strategy:
          activeDeadlineSeconds: 21600
          resources: {}
          rollingParams:
            intervalSeconds: 1
            maxSurge: 25%
            maxUnavailable: 25%
            timeoutSeconds: 600
            updatePeriodSeconds: 1
          type: Rolling
        template:
          metadata:
            annotations:
              openshift.io/generated-by: OpenShiftWebConsole
            creationTimestamp: null
            labels:
              app: debezium-test-cloud
              deploymentconfig: debezium-test-cloud
          spec:
            containers:
              - env:
                  - name: BOOTSTRAP_SERVERS
                    value: '192.168.100.228:9092'
                  - name: GROUP_ID
                    value: test-cloud
                  - name: CONFIG_STORAGE_TOPIC
                    value: base.test-cloud.config
                  - name: OFFSET_STORAGE_TOPIC
                    value: base.test-cloud.offset
                  - name: STATUS_STORAGE_TOPIC
                    value: base.test-cloud.status
                  - name: CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE
                    value: 'true'
                  - name: CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE
                    value: 'true'
                  - name: CONNECT_PRODUCER_MAX_REQUEST_SIZE
                    value: '20971520'
                  - name: CONNECT_DATABASE_HISTORY_KAFKA_RECOVERY_POLL_INTERVAL_MS
                    value: '1000'
                  - name: HEAP_OPTS
                    value: '-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0'
                image: 'registry.cn-hangzhou.aliyuncs.com/eshine/debeziumconnect:1.0.0.Beta2'
                imagePullPolicy: IfNotPresent
                name: debezium-test-cloud
                ports:
                  - containerPort: 8083
                    protocol: TCP
                  - containerPort: 8778
                    protocol: TCP
                  - containerPort: 9092
                    protocol: TCP
                  - containerPort: 9779
                    protocol: TCP
                resources:
                  limits:
                    cpu: 400m
                    memory: 1Gi
                  requests:
                    cpu: 200m
                    memory: 1Gi
                terminationMessagePath: /dev/termination-log
                terminationMessagePolicy: File
                volumeMounts:
                  - mountPath: /kafka/config
                    name: debezium-test-cloud-1
                  - mountPath: /kafka/data
                    name: debezium-test-cloud-2
                  - mountPath: /kafka/logs
                    name: debezium-test-cloud-3
            dnsPolicy: ClusterFirst
            restartPolicy: Always
            schedulerName: default-scheduler
            securityContext: {}
            terminationGracePeriodSeconds: 30
            volumes:
              - emptyDir: {}
                name: debezium-test-cloud-1
              - emptyDir: {}
                name: debezium-test-cloud-2
              - emptyDir: {}
                name: debezium-test-cloud-3
        test: false
        triggers:
          - type: ConfigChange
      status:
        availableReplicas: 2
        conditions:
          - lastTransitionTime: '2019-11-25T06:44:30Z'
            lastUpdateTime: '2019-11-25T06:44:44Z'
            message: replication controller "debezium-test-cloud-15" successfully rolled out
            reason: NewReplicationControllerAvailable
            status: 'True'
            type: Progressing
          - lastTransitionTime: '2019-12-31T10:06:23Z'
            lastUpdateTime: '2019-12-31T10:06:23Z'
            message: Deployment config has minimum availability.
            status: 'True'
            type: Available
        details:
          causes:
            - type: Manual
          message: manual change
        latestVersion: 15
        observedGeneration: 29
        readyReplicas: 2
        replicas: 2
        unavailableReplicas: 0
        updatedReplicas: 2
      

      3. Connect cluster in openshift: one service with two pods

      4.  

           a). task_connector_1_0 and task_connector_3_0 were running in podA; task_connector_2_0 was running in PodB

           b) Then, PodA console follows error log:  In attachment "12_31_d8c7j_1.jpg" 

             

           c) Then, Rebalance started;

           d) However, In PodB, all task (task_connector_1_0, task_connector_2_0, task_connector_3_0) are running.  In PodA, still task_connector_1_0 and task_connector_3_0.

           e) So the repeat task appeared.

       

          

        Attachments

        1. 12_31_d8c7j_1.jpg
          524 kB
          kaikai.hou

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                kaikai.hou kaikai.hou
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: