Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34703

Fix pyspark test when using sort_values on Pandas

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.7
    • 2.4.8
    • PySpark
    • None

    Description

      Three PySpark tests are currently failed in Jenkins 2.4 build: test_column_order, test_complex_groupby, test_udf_with_key.

      ======================================================================                                                                                                                                                                 
      ERROR: test_column_order (pyspark.sql.tests.GroupedMapPandasUDFTests)                                                                                                                                                                  
      ----------------------------------------------------------------------
      Traceback (most recent call last):                                                                                 
        File "/spark/python/pyspark/sql/tests.py", line 5996, in test_column_order                                                                                                                                                           
          expected = pd_result.sort_values(['id', 'v']).reset_index(drop=True)                                           
        File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4711, in sort_values                    
          for x in by]                                       
        File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1702, in _get_label_or_level_values
          self._check_label_or_level_ambiguity(key, axis=axis)
        File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1656, in _check_label_or_level_ambiguity
          raise ValueError(msg)                                                                                          
      ValueError: 'id' is both an index level and a column label, which is ambiguous.
                                                               
      ======================================================================
      ERROR: test_complex_groupby (pyspark.sql.tests.GroupedMapPandasUDFTests)
      ----------------------------------------------------------------------
      Traceback (most recent call last):                      
        File "/spark/python/pyspark/sql/tests.py", line 5765, in test_complex_groupby
          expected = expected.sort_values(['id', 'v']).reset_index(drop=True)
        File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4711, in sort_values
          for x in by]                                                                                                   
        File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1702, in _get_label_or_level_values                                                                                                                       
          self._check_label_or_level_ambiguity(key, axis=axis)                                                                                                                                                                               
        File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1656, in _check_label_or_level_ambiguity                                                                                                                  
          raise ValueError(msg)                                                                                                                                                                                                              
      ValueError: 'id' is both an index level and a column label, which is ambiguous.                                                                                                                                                        
                                                                                                                                                                                                                                             
      ======================================================================                                             
      ERROR: test_udf_with_key (pyspark.sql.tests.GroupedMapPandasUDFTests)                                                                                                                                                                  
      ----------------------------------------------------------------------                                             
      Traceback (most recent call last):                                                                                                                                                                                                     
        File "/spark/python/pyspark/sql/tests.py", line 5922, in test_udf_with_key
          .sort_values(['id', 'v']).reset_index(drop=True)                                                                                                                                                                                   
        File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4711, in sort_values                                                                                                                                        
          for x in by]                                                                                                                                                                                                                       
        File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1702, in _get_label_or_level_values   
          self._check_label_or_level_ambiguity(key, axis=axis)                                                                                                                                                                               
        File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1656, in _check_label_or_level_ambiguity                                                                                                                  
          raise ValueError(msg)                                                                                                                                                                                                              
      ValueError: 'id' is both an index level and a column label, which is ambiguous.   
      

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            viirya L. C. Hsieh
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: