[SPARK-33415] Column.__repr__ shouldn't encode JVM response - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: PySpark, SQL
Labels:
None

Target Version/s:

3.1.0

Description

At the moment PySpark Column encodes JVM response in _repr_ method.

As a result, column names using only ASCII characters get b prefix

>>> from pyspark.sql.functions import col                                                                                                                                                
>>> col("abc")                                                                                                                                                                           
Column<b'abc'>

and the others ugly byte string

>>> col("wąż")                                                                                                                                                                           
Column<b'w\xc4\x85\xc5\xbc'>

This behaviour is inconsistent with other parts of the API, for example:

>>> spark.createDataFrame([], "`wąż` long")                                                                                                                                              
DataFrame[wąż: bigint]

and Scala

scala> col("wąż")
res0: org.apache.spark.sql.Column = wąż

and R

> column("wąż")
Column wąż

Encoding has been originally introduced with ~~SPARK-5859~~, but it doesn't seem like it is really required.

Desired behaviour

>>> col("wąż")                                                                                                                                                                           
Column<'wąż'>

>>> col("wąż")                                                                                                                                                                           
Column<wąż>