[SPARK-5037] support dynamic loading of input DStreams in pyspark streaming - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Auto Closed
Affects Version/s: 1.2.0
Fix Version/s: None
Component/s: DStreams, PySpark
Labels:
- bulk-closed

Description

The scala and java streaming APIs support "external" InputDStreams (e.g. the ZeroMQReceiver example) through a number of mechanisms, for instance by overriding ActorReceiver or just subclassing Receiver directly. The pyspark streaming API does not currently allow similar flexibility, being limited at the moment to file-backed text and binary streams or socket text streams.

It would be great to open up the pyspark streaming API to other stream sources, putting it closer to on par with the JVM APIs.

One way of doing this could be to support dynamically loading InputDStream implementations through reflection at the JVM level, analogously to what is currently done for Hadoop InputFormats in the regular pyspark context.py Hadoop methods.

I'll submit a PR momentarily with my shot at this. Comments and alternative approaches more than welcome.

Attachments

Issue Links

links to

[Github] Pull Request #3858 (industrial-sloth)

Activity

People

Assignee:: Unassigned

Reporter:: Jascha Swisher

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 31/Dec/14 17:25

Updated:: 06/Jun/19 13:57

Resolved:: 06/Jun/19 13:57