Description
There was an actual issue, SPARK-23300, and we fixed this by manually checking if the package is installed. This way needed duplicated codes and could only check dependencies. There are many conditions, for example, Python version specific or other packages like NumPy. I think this is something we should fix.
`unittest` module can print out the skipped messages but these were swallowed so far in our own testing script. This PR prints out the messages below after sorted.
It would be nicer if we remove the duplications and print out all the skipped tests. For example, as below:
This PR proposes to remove duplicated dependency checking logics and also print out skipped tests from unittests.
For example, as below:
Skipped tests in pyspark.sql.tests with pypy: test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' ... Skipped tests in pyspark.sql.tests with python3: test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.' test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.' ...
Actual format can be a bit varied per the discussion in the PR. Please check out the PR for exact format.
Attachments
Issue Links
- relates to
-
SPARK-23300 Print out if Pandas and PyArrow are installed or not in tests
- Resolved
- links to