Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Not A Problem
-
None
-
None
Description
Spark is great for synchronous operations.
But sometimes I need to call a database/web server/etc from my transform, and the Spark pipeline stalls waiting for it.
Avoiding that would be great!
I suggest we add a new method RDD.mapAsync(), which can execute these operations concurrently, avoiding the bottleneck.
I've written a quick'n'dirty implementation of what I have in mind:
https://gist.github.com/paulo-raca/d121cf27905cfb1fafc3
What do you think?
If you agree with this feature, I can work on a pull request.