[SPARK-13700] Rdd.mapAsync(): Easily mix Spark and asynchroneous transformation - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Not A Problem
Affects Version/s: None
Fix Version/s: None
Component/s: Spark Core
Labels:
- async
- features
- rdd
- transform

Description

Spark is great for synchronous operations.

But sometimes I need to call a database/web server/etc from my transform, and the Spark pipeline stalls waiting for it.

Avoiding that would be great!

I suggest we add a new method RDD.mapAsync(), which can execute these operations concurrently, avoiding the bottleneck.

I've written a quick'n'dirty implementation of what I have in mind:
https://gist.github.com/paulo-raca/d121cf27905cfb1fafc3

What do you think?

If you agree with this feature, I can work on a pull request.

Attachments

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned

Reporter:: Paulo Costa

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 05/Mar/16 18:52

Updated:: 07/Mar/16 18:39

Resolved:: 06/Mar/16 19:37

Agile

View on Board

Rdd.mapAsync(): Easily mix Spark and asynchroneous transformation

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment