Details
Description
Partitioning hints in PySpark do not work because the column parameters are not converted to Catalyst `Expression` instances before being passed to the hint resolver.
The behavior of the hints is documented here.
Example:
>>> df = spark.range(1024) >>> >>> df DataFrame[id: bigint] >>> df.hint("rebalance", "id") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line 980, in hint jdf = self._jdf.hint(name, self._jseq(parameters)) File "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, in deco raise converted from None pyspark.sql.utils.AnalysisException: REBALANCE Hint parameter should include columns, but id found >>> df.hint("repartition", "id") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line 980, in hint jdf = self._jdf.hint(name, self._jseq(parameters)) File "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, in deco raise converted from None pyspark.sql.utils.AnalysisException: REPARTITION Hint parameter should include columns, but id found