Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-399

Kafka Spout defaulting to latest offset when current offset is older then 100k

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.9.2-incubating
    • 0.9.3
    • storm-kafka
    • None

    Description

      Using storm and storm-kafka 0.9.2-incubating

      In the storm kafka spout the default for maxOffsetBehind is 100000
      see https://github.com/apache/incubator-storm/blob/v0.9.2-incubating/external/storm-kafka/src/jvm/storm/kafka/KafkaConfig.java#L38

      This default is too low and causes the kafka spout to start from the latest offset instead of the last committed offset without warning.
      see https://github.com/apache/incubator-storm/blob/v0.9.2-incubating/external/storm-kafka/src/jvm/storm/kafka/PartitionManager.java#L95

      Producing the following log output from the storm worker processes

      2014-07-09 18:02:15 s.k.PartitionManager [INFO] Read last commit
      offset from zookeeper: 15266940; old topology_id:
      ef3f1f89-f64c-4947-b6eb-0c7fb9adb9ea - new topology_id:
      5747dba6-c947-4c4f-af4a-4f50a84817bf
      2014-07-09 18:02:15 s.k.PartitionManager [INFO] Last commit offset
      from zookeeper: 15266940
      2014-07-09 18:02:15 s.k.PartitionManager [INFO] Commit offset 22092614
      is more than 100000 behind, resetting to startOffsetTime=-2
      2014-07-09 18:02:15 s.k.PartitionManager [INFO] Starting Kafka
      prd-use1c-pr-08-kafka-kamq-0004:4 from offset 22092614
      

      To fix this problem I ended up setting spout config in my topology like so

      spoutConf.maxOffsetBehind = Long.MAX_VALUE;
      

      Why would the kafka spout skip to the latest offset if the current offset
      is more then 100000 behind by default?

      This seems like a bad default value, the spout literally skipped over
      months of data without any warning.

      Are the core contributors open to accepting a pull request that would set
      the default to Long.MAX_VALUE?

      Attachments

        Activity

          People

            curtissallen Curtis Allen
            curtissallen Curtis Allen
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: