Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
An infra platform might want to provide the user a Zepplin notebook and execute user's job with user's command input like "python entry_point.py ...". This is better for the end user because he/she feels that the "entry_point.py" seems in the local workbench.
This may translate to below submarine command in the platform when submitting the job:
... job run
--localization entry_script.py:./
--localization depedency_script1.py:./
--localization depedency_script2.py:./
--worker_launch_cmd "python entry_point.py .."
Or
... job run
--localization entry_script.py:./
--localization depedency_scripts_dir:./
--worker_launch_cmd "python entry_script.py .."
When running with the above command, both will fail due to module import error from the entry_point.py. This is because YARN only creates symbol links in the container's work dir (the real scripts files are in different cache folders) and python module import won't know that.
One possible solution is set localization with a directory containing all scripts and change the worker_launch_cmd to "cd scripts_dir && python entry_script.py". But this solution makes the user experience bad which feels not in a local workbench.
And another solution is using "PYTHONPATH" environment variable. This solution can keep the user experience good and won't need YARN localization internal changes.
... job run # the entry point --localization entry_script.py:<path>/entry_script.py # the dependency Python scripts of the entry point --localization depedency_scripts_dir:<path>/dependency_scripts_dir # the PYTHONPATH env to make dependency available to entry script --env PYTHONPATH="<path>/dependency_scripts_dir" --worker_launch_cmd "python <path>/entry_script.py ..."
And we should document this.