Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37465

PySpark tests failing on Pandas 0.23

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDelete
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.3.0
    • PySpark
    • None

    Description

      I was running Spark tests with Pandas 0.23.4 and got the error below. The minimum Pandas version is currently 0.23.2 (Github). Upgrading to 0.24.0 fixes the error. I think Spark needs this fix (Github) in Pandas.

      $ python/run-tests --testnames 'pyspark.pandas.tests.data_type_ops.test_boolean_ops BooleanOpsTest.test_floordiv'
      
      ...
      
      ======================================================================
      ERROR [5.785s]: test_floordiv (pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py", line 128, in test_floordiv
          self.assert_eq(b_pser // b_pser.astype(int), b_psser // b_psser.astype(int))
        File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1069, in wrapper
          result = safe_na_op(lvalues, rvalues)
        File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1033, in safe_na_op
          return na_op(lvalues, rvalues)
        File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1027, in na_op
          result = missing.fill_zeros(result, x, y, op_name, fill_zeros)
        File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py", line 641, in fill_zeros
          signs = np.sign(y if name.startswith(('r', '__r')) else x)
      TypeError: ufunc 'sign' did not contain a loop with signature matching types dtype('bool') dtype('bool')
      

      These are my relevant package versions:

      $ conda list | grep -e numpy -e pyarrow -e pandas -e python
      # packages in environment at /home/circleci/miniconda/envs/python3:
      numpy                     1.16.6           py36h0a8e133_3  
      numpy-base                1.16.6           py36h41b4c56_3  
      pandas                    0.23.4           py36h04863e7_0  
      pyarrow                   1.0.1           py36h6200943_36_cpu    conda-forge
      python                    3.6.12               hcff3b4d_2    anaconda
      python-dateutil           2.8.1                      py_0    anaconda
      python_abi                3.6                     1_cp36m    conda-forg
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            yikunkero Yikun Jiang Assign to me
            rshkv Willi Raschkowski
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment