Uploaded image for project: 'Apache Submarine'
  1. Apache Submarine
  2. SUBMARINE-35

[Submarine] Document "PYTHONPATH" environment variable setting when using -localization options

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.0
    • Component/s: None
    • Labels:
      None

      Description

      An infra platform might want to provide the user a Zepplin notebook and execute user's job with user's command input like "python entry_point.py ...". This is better for the end user because he/she feels that the "entry_point.py" seems in the local workbench.

      This may translate to below submarine command in the platform when submitting the job:

       

      ... job run
        --localization entry_script.py:./
        --localization depedency_script1.py:./
        --localization depedency_script2.py:./
        --worker_launch_cmd "python entry_point.py .."
      

      Or 

       

      ... job run
        --localization entry_script.py:./
        --localization depedency_scripts_dir:./
        --worker_launch_cmd "python entry_script.py .."
      

       

      When running with the above command, both will fail due to module import error from the entry_point.py. This is because YARN only creates symbol links in the container's work dir (the real scripts files are in different cache folders) and python module import won't know that.

      One possible solution is set localization with a directory containing all scripts and change the worker_launch_cmd to "cd scripts_dir && python entry_script.py". But this solution makes the user experience bad which feels not in a local workbench.

      And another solution is using "PYTHONPATH" environment variable. This solution can keep the user experience good and won't need YARN localization internal changes.

      ... job run
       # the entry point
       --localization entry_script.py:<path>/entry_script.py
       # the dependency Python scripts of the entry point
       --localization depedency_scripts_dir:<path>/dependency_scripts_dir
       # the PYTHONPATH env to make dependency available to entry script
       --env PYTHONPATH="<path>/dependency_scripts_dir"
       --worker_launch_cmd "python <path>/entry_script.py ..."

      And we should document this.

        Attachments

        1. YARN-9160-trunk.001.patch
          1 kB
          Zhankun Tang

          Activity

            People

            • Assignee:
              tangzhankun Zhankun Tang
              Reporter:
              tangzhankun Zhankun Tang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: