Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1134

ipython won't run standalone python script

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.9.0, 0.9.1
    • 0.9.2, 1.0.0
    • PySpark

    Description

      Using Spark 0.9.0, python 2.6.6, and ipython 1.1.0.

      The problem: If I want to run a python script as a standalone app, the docs say I should execute the command "pyspark myscript.py". This works as long as IPYTHON=0. But if IPYTHON=1 this doesn't work.

      This problem arose for me because I tried to save myself typing by setting IPYTHON=1 in my shell profile script. Which then meant I was unable to execute pyspark standalone scripts.

      My analysis:
      in the pyspark script, command line arguments are simply ignored if ipython is used:

      if [[ "$IPYTHON" = "1" ]] ; then
        exec ipython $IPYTHON_OPTS
      else
        exec "$PYSPARK_PYTHON" "$@"
      fi

      I thought I could get around this by changing the script to pass $@. However, this doesn't work: doing so results in an error saying multiple spark contexts can't be run at once.

      This is because of a feature?/bug? of ipython related to the PYTHONSTARTUP environment variable. the pyspark script sets this variable to point to the python/shell.py script, which initializes the Spark Context. In regular python, the PYTHONSTARTUP script runs ONLY if python is invoked in interactive mode; if run with a script, it ignores the variable. iPython runs that script every time, regardless. Which means it will always execute Spark's shell.py script to initialize the spark context even when it was invoked with a script.

      Proposed solution:
      short term: add this information to the Spark docs regarding iPython. Something like "Note, iPython can only be used interactively. Use regular Python to execute pyspark script files."
      long term: change the pyspark script to tell if arguments are passed in; if so, just call python instead of pyspark, or don't set the PYTHONSTARTUP variable? Or maybe fix shell.py to detect if it's being invoked in non-interactively and not initialize sc.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dcarroll@cloudera.com Diana Carroll
            dcarroll@cloudera.com Diana Carroll
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment