Uploaded image for project: 'Apache Submarine'
  1. Apache Submarine
  2. SUBMARINE-35

[Submarine] Document "PYTHONPATH" environment variable setting when using -localization options

Attach filesAttach ScreenshotVotersStop watchingWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.2.0
    • None
    • None

    Description

      An infra platform might want to provide the user a Zepplin notebook and execute user's job with user's command input like "python entry_point.py ...". This is better for the end user because he/she feels that the "entry_point.py" seems in the local workbench.

      This may translate to below submarine command in the platform when submitting the job:

       

      ... job run
        --localization entry_script.py:./
        --localization depedency_script1.py:./
        --localization depedency_script2.py:./
        --worker_launch_cmd "python entry_point.py .."
      

      Or 

       

      ... job run
        --localization entry_script.py:./
        --localization depedency_scripts_dir:./
        --worker_launch_cmd "python entry_script.py .."
      

       

      When running with the above command, both will fail due to module import error from the entry_point.py. This is because YARN only creates symbol links in the container's work dir (the real scripts files are in different cache folders) and python module import won't know that.

      One possible solution is set localization with a directory containing all scripts and change the worker_launch_cmd to "cd scripts_dir && python entry_script.py". But this solution makes the user experience bad which feels not in a local workbench.

      And another solution is using "PYTHONPATH" environment variable. This solution can keep the user experience good and won't need YARN localization internal changes.

      ... job run
       # the entry point
       --localization entry_script.py:<path>/entry_script.py
       # the dependency Python scripts of the entry point
       --localization depedency_scripts_dir:<path>/dependency_scripts_dir
       # the PYTHONPATH env to make dependency available to entry script
       --env PYTHONPATH="<path>/dependency_scripts_dir"
       --worker_launch_cmd "python <path>/entry_script.py ..."

      And we should document this.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tangzhankun Zhankun Tang
            tangzhankun Zhankun Tang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment