Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40178

Rebalance/Repartition Hints Not Working in PySpark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0, 3.2.1, 3.3.0, 3.2.2
    • 4.0.0
    • PySpark
    • None
    • Mac OSX 11.4 Big Sur

      Python 3.9.7

      Spark version >= 3.2.0 (perhaps before as well).

    • Patch

    Description

      Partitioning hints in PySpark do not work because the column parameters are not converted to Catalyst `Expression` instances before being passed to the hint resolver.

      The behavior of the hints is documented here.

      Example:

       

      >>> df = spark.range(1024)
      >>> 
      >>> df
      DataFrame[id: bigint]
      >>> df.hint("rebalance", "id")
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line 980, in hint
          jdf = self._jdf.hint(name, self._jseq(parameters))
        File "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
        File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, in deco
          raise converted from None
      pyspark.sql.utils.AnalysisException: REBALANCE Hint parameter should include columns, but id found
      >>> df.hint("repartition", "id")
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line 980, in hint
          jdf = self._jdf.hint(name, self._jseq(parameters))
        File "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
        File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, in deco
          raise converted from None
      pyspark.sql.utils.AnalysisException: REPARTITION Hint parameter should include columns, but id found 

       

       

      Attachments

        Activity

          People

            advancedxy YE
            mhconradt Maxwell Conradt
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 168h
                168h
                Remaining:
                Remaining Estimate - 168h
                168h
                Logged:
                Time Spent - Not Specified
                Not Specified