Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.2.0
-
None
Description
In ./python/run-tests script, we run the doctests in ./pyspark/mllib/classification.py.
The output is as following:
$ ./python/run-tests ... Running test: pyspark/mllib/classification.py Traceback (most recent call last): File "pyspark/mllib/classification.py", line 20, in <module> import numpy File "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/__init__.py", line 170, in <module> from . import add_newdocs File "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/add_newdocs.py", line 13, in <module> from numpy.lib import add_newdoc File "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/lib/__init__.py", line 8, in <module> from .type_check import * File "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/lib/type_check.py", line 11, in <module> import numpy.core.numeric as _nx File "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/core/__init__.py", line 46, in <module> from numpy.testing import Tester File "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/testing/__init__.py", line 13, in <module> from .utils import * File "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/testing/utils.py", line 15, in <module> from tempfile import mkdtemp File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/tempfile.py", line 34, in <module> from random import Random as _Random File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/mllib/random.py", line 24, in <module> from pyspark.rdd import RDD File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/__init__.py", line 51, in <module> from pyspark.context import SparkContext File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/context.py", line 22, in <module> from tempfile import NamedTemporaryFile ImportError: cannot import name NamedTemporaryFile 0.07 real 0.04 user 0.02 sys Had test failures; see logs.
The problem is a cyclic import of tempfile module.
The cause of it is that pyspark.mllib.random module exists in the directory where pyspark.mllib.classification module exists.
classification module imports numpy module, and then numpy module imports tempfile module from its inside.
Now the first entry sys.path is the directory "./python/pyspark/mllib" (where the executed file "classification.py" exists), so tempfile module imports pyspark.mllib.random module (not the standard library "random" module).
Finally, import chains reach tempfile again, then a cyclic import is formed.
Summary: classification → numpy → tempfile → pyspark.mllib.random → tempfile → (cyclic import!!)
Furthermore, stat module is in a standard library, and pyspark.mllib.stat module exists. This also may be troublesome.
commit: 0e8203f4fb721158fb27897680da476174d24c4b
A fundamental solution is to avoid using module names used by standard libraries (currently "random" and "stat").
A difficulty of this solution is to rename pyspark.mllib.random and pyspark.mllib.stat, which may be already used.