[SPARK-40178] Rebalance/Repartition Hints Not Working in PySpark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2.0, 3.2.1, 3.3.0, 3.2.2
Fix Version/s: 4.0.0
Component/s: PySpark
Labels:
None
Environment:

Mac OSX 11.4 Big Sur

Python 3.9.7

Spark version >= 3.2.0 (perhaps before as well).

Flags:

Patch

Description

Partitioning hints in PySpark do not work because the column parameters are not converted to Catalyst `Expression` instances before being passed to the hint resolver.

The behavior of the hints is documented here.

Example:

>>> df = spark.range(1024)
>>> 
>>> df
DataFrame[id: bigint]
>>> df.hint("rebalance", "id")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line 980, in hint
    jdf = self._jdf.hint(name, self._jseq(parameters))
  File "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, in deco
    raise converted from None
pyspark.sql.utils.AnalysisException: REBALANCE Hint parameter should include columns, but id found
>>> df.hint("repartition", "id")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line 980, in hint
    jdf = self._jdf.hint(name, self._jseq(parameters))
  File "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, in deco
    raise converted from None
pyspark.sql.utils.AnalysisException: REPARTITION Hint parameter should include columns, but id found

Attachments

Issue Links

links to

[Github] Pull Request #37616 (mhconradt)

[Github] Pull Request #42255 (advancedxy)

Activity

People

Assignee:: YE

Reporter:: Maxwell Conradt

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 22/Aug/22 11:50

Updated:: 21/Aug/23 00:29

Resolved:: 21/Aug/23 00:29

Time Tracking

Estimated:

168h

Remaining:

168h

Logged:

Not Specified