[KAFKA-9385] Connect cluster: connector task repeat like a splitbrain cluster problem - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: connect
Labels:
None

Description

I am using Debezium. And find a task repeat problem.[Jump|https://issues.redhat.com/browse/DBZ-1573?jql=key%20in%20watchedIssues()]

1. I push the Debezium image to our private image repository.

2. Deploy the connect cluster with the following Deployment Config：

//代码占位符
apiVersion: apps.openshift.io/v1
kind: DeploymentConfig
metadata:
  annotations:
    openshift.io/generated-by: OpenShiftWebConsole
  creationTimestamp: '2019-10-14T07:45:41Z'
  generation: 29
  labels:
    app: debezium-test-cloud
  name: debezium-test-cloud
  namespace: test
  resourceVersion: '168496156'
  selfLink: >-
    /apis/apps.openshift.io/v1/namespaces/test/deploymentconfigs/debezium-test-cloud
  uid: 9f4f8f4d-ee56-11e9-a5a1-00163e0e008f
spec:
  replicas: 2
  selector:
    app: debezium-test-cloud
    deploymentconfig: debezium-test-cloud
  strategy:
    activeDeadlineSeconds: 21600
    resources: {}
    rollingParams:
      intervalSeconds: 1
      maxSurge: 25%
      maxUnavailable: 25%
      timeoutSeconds: 600
      updatePeriodSeconds: 1
    type: Rolling
  template:
    metadata:
      annotations:
        openshift.io/generated-by: OpenShiftWebConsole
      creationTimestamp: null
      labels:
        app: debezium-test-cloud
        deploymentconfig: debezium-test-cloud
    spec:
      containers:
        - env:
            - name: BOOTSTRAP_SERVERS
              value: '192.168.100.228:9092'
            - name: GROUP_ID
              value: test-cloud
            - name: CONFIG_STORAGE_TOPIC
              value: base.test-cloud.config
            - name: OFFSET_STORAGE_TOPIC
              value: base.test-cloud.offset
            - name: STATUS_STORAGE_TOPIC
              value: base.test-cloud.status
            - name: CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE
              value: 'true'
            - name: CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE
              value: 'true'
            - name: CONNECT_PRODUCER_MAX_REQUEST_SIZE
              value: '20971520'
            - name: CONNECT_DATABASE_HISTORY_KAFKA_RECOVERY_POLL_INTERVAL_MS
              value: '1000'
            - name: HEAP_OPTS
              value: '-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0'
          image: 'registry.cn-hangzhou.aliyuncs.com/eshine/debeziumconnect:1.0.0.Beta2'
          imagePullPolicy: IfNotPresent
          name: debezium-test-cloud
          ports:
            - containerPort: 8083
              protocol: TCP
            - containerPort: 8778
              protocol: TCP
            - containerPort: 9092
              protocol: TCP
            - containerPort: 9779
              protocol: TCP
          resources:
            limits:
              cpu: 400m
              memory: 1Gi
            requests:
              cpu: 200m
              memory: 1Gi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /kafka/config
              name: debezium-test-cloud-1
            - mountPath: /kafka/data
              name: debezium-test-cloud-2
            - mountPath: /kafka/logs
              name: debezium-test-cloud-3
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
        - emptyDir: {}
          name: debezium-test-cloud-1
        - emptyDir: {}
          name: debezium-test-cloud-2
        - emptyDir: {}
          name: debezium-test-cloud-3
  test: false
  triggers:
    - type: ConfigChange
status:
  availableReplicas: 2
  conditions:
    - lastTransitionTime: '2019-11-25T06:44:30Z'
      lastUpdateTime: '2019-11-25T06:44:44Z'
      message: replication controller "debezium-test-cloud-15" successfully rolled out
      reason: NewReplicationControllerAvailable
      status: 'True'
      type: Progressing
    - lastTransitionTime: '2019-12-31T10:06:23Z'
      lastUpdateTime: '2019-12-31T10:06:23Z'
      message: Deployment config has minimum availability.
      status: 'True'
      type: Available
  details:
    causes:
      - type: Manual
    message: manual change
  latestVersion: 15
  observedGeneration: 29
  readyReplicas: 2
  replicas: 2
  unavailableReplicas: 0
  updatedReplicas: 2

3. Connect cluster in openshift: one service with two pods

4.

a). task_connector_1_0 and task_connector_3_0 were running in podA; task_connector_2_0 was running in PodB

b) Then, PodA console follows error log: In attachment "12_31_d8c7j_1.jpg"

c) Then, Rebalance started;

d) However, In PodB, all task (task_connector_1_0, task_connector_2_0, task_connector_3_0) are running. In PodA, still task_connector_1_0 and task_connector_3_0.

e) So the repeat task appeared.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

12_31_d8c7j_1.jpg
08/Jan/20 10:24
524 kB
kaikai.hou

Issue Links

relates to

KAFKA-9184 Redundant task creation and periodic rebalances after zombie worker rejoins the group

Resolved

Connect cluster: connector task repeat like a splitbrain cluster problem

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates