Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-4016

Kafka spout: start using poll(Duration)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.6.0
    • 2.6.1
    • storm-kafka
    • None

    Description

      Kafka has deprecated poll(long) in favour of poll(Duration): KIP-266: Fix consumer indefinite blocking behavior

      There is also an interesting report about the behaviour of it poll:

      The pre-existing variant poll(long timeout) would block indefinitely for metadata updates if they were needed, then it would issue a fetch and poll for timeout ms for new records. The initial indefinite metadata block caused applications to become stuck when the brokers became unavailable. The existence of the timeout parameter made the indefinite block especially unintuitive.

      We will add a new method poll(Duration timeout) with the semantics:

      1. iff a metadata update is needed:
        1. send (asynchronous) metadata requests
        2. poll for metadata responses (counts against timeout)
          • if no response within timeout, return an empty collection immediately
      2. if there is fetch data available, return it immediately
      3. if there is no fetch request in flight, send fetch requests
      4. poll for fetch responses (counts against timeout)
        • if no response within timeout, return an empty collection (leaving async fetch request for the next poll)
        • if we get a response, return the response

      We will deprecate the original method, poll(long timeout), and we will not change its semantics, so it remains:

      1. iff a metadata update is needed:
        1. send (asynchronous) metadata requests
        2. poll for metadata responses indefinitely until we get it
      2. if there is fetch data available, return it immediately
      3. if there is no fetch request in flight, send fetch requests
      4. poll for fetch responses (counts against timeout)
        • if no response within timeout, return an empty collection (leaving async fetch request for the next poll)
        • if we get a response, return the response

      One notable usage is prohibited by the new poll: previously, you could call poll(0) to block for metadata updates, for example to initialize the client, supposedly without fetching records. Note, though, that this behavior is not according to any contract, and there is no guarantee that poll(0) won't return records the first time it's called. Therefore, it has always been unsafe to ignore the response.

       

       

      https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75974886

      Attachments

        Activity

          People

            rabreu Rui Abreu
            rabreu Rui Abreu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 20m
                1h 20m