Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6051

Add an option for DirectKafkaInputDStream to commit the offsets into ZK

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.3.0
    • Fix Version/s: None
    • Component/s: DStreams
    • Labels:
      None

      Description

      Currently in DirectKafkaInputDStream, offset is managed by Spark Streaming itself without ZK or Kafka involved, which will make several third-party offset monitoring tools fail to monitor the status of Kafka consumer. So here as a option to commit the offset to ZK when each job is finished, the process is implemented as a asynchronized way, so the main processing flow will not be blocked, already tested with KafkaOffsetMonitor tools.

        Issue Links

          Activity

          Hide
          apachespark Apache Spark added a comment -

          User 'jerryshao' has created a pull request for this issue:
          https://github.com/apache/spark/pull/4805

          Show
          apachespark Apache Spark added a comment - User 'jerryshao' has created a pull request for this issue: https://github.com/apache/spark/pull/4805
          Hide
          dutrow Dan Dutrow added a comment -

          Is there documentation for how to update the metrics (#messages per batch) in the Spark Streaming tab when using the Direct API? Does the Streaming tab get its information from Zookeeper or something else internally?

          Show
          dutrow Dan Dutrow added a comment - Is there documentation for how to update the metrics (#messages per batch) in the Spark Streaming tab when using the Direct API? Does the Streaming tab get its information from Zookeeper or something else internally?
          Hide
          cody@koeninger.org Cody Koeninger added a comment -

          Responded on the mailing list, but for posterity's sake:

          Which version of spark are you on? I thought that was added to the spark UI in recent versions.

          DIrect api doesn't have any inherent interaction with zookeeper. If you need number of messages per batch and aren't on a recent enough version of spark to see them in the ui, you can get them programmatically from the offset ranges. See the definition of count() in recent versions of KafkaRDD for an example.

          Show
          cody@koeninger.org Cody Koeninger added a comment - Responded on the mailing list, but for posterity's sake: Which version of spark are you on? I thought that was added to the spark UI in recent versions. DIrect api doesn't have any inherent interaction with zookeeper. If you need number of messages per batch and aren't on a recent enough version of spark to see them in the ui, you can get them programmatically from the offset ranges. See the definition of count() in recent versions of KafkaRDD for an example.

            People

            • Assignee:
              Unassigned
              Reporter:
              jerryshao Saisai Shao
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development