Description
I was running Spark tests with Pandas 0.23.4 and got the error below. The minimum Pandas version is currently 0.23.2 (Github). Upgrading to 0.24.0 fixes the error. I think Spark needs this fix (Github) in Pandas.
$ python/run-tests --testnames 'pyspark.pandas.tests.data_type_ops.test_boolean_ops BooleanOpsTest.test_floordiv' ... ====================================================================== ERROR [5.785s]: test_floordiv (pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py", line 128, in test_floordiv self.assert_eq(b_pser // b_pser.astype(int), b_psser // b_psser.astype(int)) File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1069, in wrapper result = safe_na_op(lvalues, rvalues) File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1033, in safe_na_op return na_op(lvalues, rvalues) File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1027, in na_op result = missing.fill_zeros(result, x, y, op_name, fill_zeros) File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py", line 641, in fill_zeros signs = np.sign(y if name.startswith(('r', '__r')) else x) TypeError: ufunc 'sign' did not contain a loop with signature matching types dtype('bool') dtype('bool')
These are my relevant package versions:
$ conda list | grep -e numpy -e pyarrow -e pandas -e python # packages in environment at /home/circleci/miniconda/envs/python3: numpy 1.16.6 py36h0a8e133_3 numpy-base 1.16.6 py36h41b4c56_3 pandas 0.23.4 py36h04863e7_0 pyarrow 1.0.1 py36h6200943_36_cpu conda-forge python 3.6.12 hcff3b4d_2 anaconda python-dateutil 2.8.1 py_0 anaconda python_abi 3.6 1_cp36m conda-forg
Attachments
Issue Links
- is related to
-
SPARK-37514 Remove workarounds due to older pandas
- Resolved
- links to