Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2377

Create a Python API for Spark Streaming

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0
    • Component/s: DStreams, PySpark
    • Labels:
      None
    • Target Version/s:

      Description

      Spark Streaming currently offers APIs in Scala and Java. It would be great feature add to have a Python API as well.

      This is probably a large task that will span many issues if undertaken. This ticket should provide some place to track overall progress towards an initial Python API for Spark Streaming.

        Issue Links

          Activity

          Hide
          tdas Tathagata Das added a comment -

          Thanks for making this JIRA. There has been significant progress made towards this. I am hoping that we can make a public PR of the minimum implementation some time soon, so that others can start filling up the missing functionality. Will post on this JIRA when we have updates.

          Show
          tdas Tathagata Das added a comment - Thanks for making this JIRA. There has been significant progress made towards this. I am hoping that we can make a public PR of the minimum implementation some time soon, so that others can start filling up the missing functionality. Will post on this JIRA when we have updates.
          Hide
          farrellee Matthew Farrellee added a comment -

          is this still in progress?

          is the code available somewhere public?

          Show
          farrellee Matthew Farrellee added a comment - is this still in progress? is the code available somewhere public?
          Hide
          tdas Tathagata Das added a comment -

          It is a in pre-alpha PR towards my repository. Done by Kenichi. If you want a preview you can take a look at this.
          https://github.com/tdas/spark/pull/11

          My plan is that as soon as Spark 1.1 release madness is over, Josh Rosen, Kenichi Takagiwa, and I are going to work on get this out. This is a very basic version that people can start playing with and start improving upon.

          Show
          tdas Tathagata Das added a comment - It is a in pre-alpha PR towards my repository. Done by Kenichi. If you want a preview you can take a look at this. https://github.com/tdas/spark/pull/11 My plan is that as soon as Spark 1.1 release madness is over, Josh Rosen , Kenichi Takagiwa , and I are going to work on get this out. This is a very basic version that people can start playing with and start improving upon.
          Hide
          farrellee Matthew Farrellee added a comment -

          thanks, i'll take a look

          Show
          farrellee Matthew Farrellee added a comment - thanks, i'll take a look
          Hide
          jyotiska Jyotiska NK added a comment - - edited

          I have been watching the work going on PR #11 for a while. Is there any way to contribute to this?

          Show
          jyotiska Jyotiska NK added a comment - - edited I have been watching the work going on PR #11 for a while. Is there any way to contribute to this?
          Hide
          farrellee Matthew Farrellee added a comment -

          it's a little tricky. you need to clone tdas' or giwa's repository, make changes on master (it's far from current spark master) and submit pull requests to giwa or tdas.

          imho, it'd be much simpler if the PR was tagged [WIP] and directed toward the apache/spark repo! (pls!)

          Show
          farrellee Matthew Farrellee added a comment - it's a little tricky. you need to clone tdas' or giwa's repository, make changes on master (it's far from current spark master) and submit pull requests to giwa or tdas. imho, it'd be much simpler if the PR was tagged [WIP] and directed toward the apache/spark repo! (pls!)
          Hide
          davies Davies Liu added a comment -

          Kenichi Takagiwa I also start to work on this (based on your branch), will send out an WIP PR recently.

          Show
          davies Davies Liu added a comment - Kenichi Takagiwa I also start to work on this (based on your branch), will send out an WIP PR recently.
          Hide
          apachespark Apache Spark added a comment -

          User 'davies' has created a pull request for this issue:
          https://github.com/apache/spark/pull/2538

          Show
          apachespark Apache Spark added a comment - User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/2538
          Hide
          giwa Kenichi Takagiwa added a comment -

          I think it is good to discuss in https://github.com/apache/spark/pull/2538 not to diverse discussion.

          Show
          giwa Kenichi Takagiwa added a comment - I think it is good to discuss in https://github.com/apache/spark/pull/2538 not to diverse discussion.
          Hide
          tdas Tathagata Das added a comment -

          Many thanks to Davies Liu and Kenichi Takagiwa for making this happen! It has been merged!

          Show
          tdas Tathagata Das added a comment - Many thanks to Davies Liu and Kenichi Takagiwa for making this happen! It has been merged!
          Hide
          prabeeshk Prabeesh K added a comment -

          Hi Tathagata Das,
          I wish start on Python API MQTT Streaming.
          Which branch I can use for adding this ?
          cc: Davies Liu, Kenichi Takagiwa

          Show
          prabeeshk Prabeesh K added a comment - Hi Tathagata Das , I wish start on Python API MQTT Streaming. Which branch I can use for adding this ? cc: Davies Liu , Kenichi Takagiwa
          Hide
          giwa Kenichi Takagiwa added a comment -

          Hi Prabeesh K

          Davies Liu is working for Python API: receiverStream(). I think you can leverage some code from here.
          https://github.com/apache/spark/pull/2833

          Python API MQTT Streaming should be a simple wrapper over the actual functionality implemented in Scala.

          Show
          giwa Kenichi Takagiwa added a comment - Hi Prabeesh K Davies Liu is working for Python API: receiverStream(). I think you can leverage some code from here. https://github.com/apache/spark/pull/2833 Python API MQTT Streaming should be a simple wrapper over the actual functionality implemented in Scala.

            People

            • Assignee:
              giwa Kenichi Takagiwa
              Reporter:
              nchammas Nicholas Chammas
            • Votes:
              3 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development