Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20036

impossible to read a whole kafka topic using kafka 0.10 and spark 2.0.0

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.0
    • 2.2.0
    • Documentation
    • None

    Description

      I use kafka 0.10.1 and java code with the following dependencies:
      <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka_2.11</artifactId>
      <version>0.10.1.1</version>
      </dependency>
      <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka-clients</artifactId>
      <version>0.10.1.1</version>
      </dependency>
      <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.11</artifactId>
      <version>2.0.0</version>
      </dependency>
      <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
      <version>2.0.0</version>
      </dependency>
      The code tries to read the whole topic using:
      kafkaParams.put("auto.offset.reset", "earliest");
      Using 5 second batches:
      jssc = new JavaStreamingContext(conf, Durations.seconds(5));
      Each batch returns empty.
      I debugged the code I noticed that KafkaUtils.fixKafkaParams is called that overrides "earliest" with "none".
      Whether this is related or not, when I used kafka 0.8 on the client with kafka 0.10.1 on the server, I could read the whole topic.

      Attachments

        1. pom.xml
          2 kB
          Daniel Nuriyev
        2. Main.java
          3 kB
          Daniel Nuriyev

        Issue Links

          Activity

            People

              koeninger Cody Koeninger
              danielnuriyev Daniel Nuriyev
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: