Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Done
-
None
-
None
Description
Currently, for the following job:
ds = .. ds.map(map_func1) .map(map_func2)
The Python function `map_func1` and `map_func2` will runs in separate Python workers and the result of `map_func1` will be transferred to JVM and then transferred to `map_func2` which may resides in another Python worker. This introduces redundant communication and serialization/deserialization overhead.