Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16950

fromOffsets parameter in Kafka's Direct Streams does not work in python3

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0, 2.0.1, 2.1.0
    • 2.0.1, 2.1.0
    • PySpark
    • None

    Description

      KafkaUtils.createDirectStream does not work in python3 when you set parameter fromOffsets (which is starting offsets of the stream on Kafka). This is because the long type is removed from python3 and py4j maps numeric variables to java.lang.Integer or java.lang.Long depending on number size, which causes ClassCastException for small offsets variables.

      This behaviour was noticed before and tests for this functionality are disabled in python3: https://github.com/apache/spark/blob/89e67d6667d5f8be9c6fb6c120fbcd350ae2950d/python/pyspark/streaming/tests.py#L1061

      Attachments

        Issue Links

          Activity

            People

              szczeles Mariusz Strzelecki
              szczeles Mariusz Strzelecki
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: