Kafka
  1. Kafka
  2. KAFKA-1417

Very slow initial high-level consumer startup in low traffic/blocking fetch scenario

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.1
    • Component/s: consumer
    • Labels:
      None

      Description

      We're seeing very slow startup times when starting a high level consumer in a low traffic/blocking fetch type setup. The example we've come across has a consumer that is set up to use 3 topics and uses a 20s/1 byte fetch timeout. What happens is that the leader finder thread adds partitions one by one and since the offset is not know this causes a call to figure out the offset. This call uses the fetcher threads simple consumer instance and locks around the call. Initially this is not a problem, but as soon as the fetcher thread has some partitions it will start fetching and since this is a low traffic situation the fetch will at least sometimes take up to 20s (again locking around the simple consumer). This leads to behavior like:

      1. Finder thread adds a partition
      2. Data thread notices it has partitions to fetch data for, locks the consumer for 20s
      3. Finder thread tries to add a partition, tries to lock consumer and blocks for 20s
      4. Rinse, repeat for each partition

        Activity

        Hide
        Sam Meder added a comment -

        The simple, although not the most efficient solution would be to add another simple consumer instance in each fetcher...

        Show
        Sam Meder added a comment - The simple, although not the most efficient solution would be to add another simple consumer instance in each fetcher...
        Hide
        Jun Rao added a comment -

        Interesting. The problem is that the leaderFinderThread uses the same SimpleConsumer (used by the fetcher thread) when issuing the OffsetBefore request. We could somehow let them use different SimpleConsumer instances. Not sure if this is the best solution though.

        Also, is there a particular reason that you use a 20s maxwait in the fetch request?

        Show
        Jun Rao added a comment - Interesting. The problem is that the leaderFinderThread uses the same SimpleConsumer (used by the fetcher thread) when issuing the OffsetBefore request. We could somehow let them use different SimpleConsumer instances. Not sure if this is the best solution though. Also, is there a particular reason that you use a 20s maxwait in the fetch request?
        Hide
        Sam Meder added a comment -

        I think the timeout is somewhat arbitrary, but since we react to any data (1 byte requirement) we don't want to be be doing a whole bunch of unnecessary fetches if there is not data. I'm going to implement the simple second consumer approach and attach a patch.

        Show
        Sam Meder added a comment - I think the timeout is somewhat arbitrary, but since we react to any data (1 byte requirement) we don't want to be be doing a whole bunch of unnecessary fetches if there is not data. I'm going to implement the simple second consumer approach and attach a patch.
        Hide
        Guozhang Wang added a comment -

        In 0.8.1, the leader finder thread would not add partition one-by-one but in batches. Would this help your case?

        Show
        Guozhang Wang added a comment - In 0.8.1, the leader finder thread would not add partition one-by-one but in batches. Would this help your case?
        Hide
        Sam Meder added a comment -

        It should, let me take a look at the 0.8.1 code.

        Show
        Sam Meder added a comment - It should, let me take a look at the 0.8.1 code.
        Hide
        Sam Meder added a comment -

        Looks fine in 0.8.1

        Show
        Sam Meder added a comment - Looks fine in 0.8.1

          People

          • Assignee:
            Neha Narkhede
            Reporter:
            Sam Meder
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development