Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22991

High read latency with spark streaming 2.2.1 and kafka 0.10.0.1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Not A Problem
    • 2.2.1
    • None
    • DStreams
    • None

    Description

      Spark 2.2.1 + Kafka 0.10 + Spark streaming.

      Batch duration is 1s, Max rate per partition is 500, poll interval is 120 seconds, max poll records is 500 and no of partitions in Kafka is 500, enabled cache consumer.

      While trying to read data from Kafka we are observing very high read latencies intermittently.The high latencies results in Kafka consumer session expiration and hence the Kafka brokers removes the consumer from the group. The consumer keeps retrying and finally fails with the

      [org.apache.kafka.clients.NetworkClient] - Disconnecting from node 12 due to request timeout
      [org.apache.kafka.clients.NetworkClient] - Cancelled request ClientRequest
      [org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient] - Cancelled FETCH request ClientRequest.**
      Due to this a lot of batches are in the queued state.

      The high read latencies are occurring whenever multiple clients are parallelly trying to read the data from the same Kafka cluster. The Kafka cluster is having a large number of brokers and can support high network bandwidth.

      When running with spark 1.5 and Kafka 0.8 consumer client against the same Kafka cluster we are not seeing any read latencies.

      Attachments

        Activity

          People

            Unassigned Unassigned
            kiranjapannavar Kiran Shivappa Japannavar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: