[SPARK-18621] PySQL SQL Types (aka Dataframa Schema) have __repr__() with Scala and not Python representation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.6.2, 2.0.2
Fix Version/s: 3.3.0
Component/s: PySpark
Labels:
None

Description

When using Python's repr() on an object, the expected result is a string that Python can evaluate to construct the object.
See: https://docs.python.org/2/library/functions.html#func-repr

However, when getting a DataFrame schema in PySpark, the code (in "_repr()_" overload methods) returns the string representation for Scala, rather than for Python.

Relevant code in PySpark:
https://github.com/apache/spark/blob/5f02d2e5b4d37f554629cbd0e488e856fffd7b6b/python/pyspark/sql/types.py#L442

Python Code:

# 1. define object
struct1 = StructType([StructField("f1", StringType(), True)])
# 2. print representation, expected to be like above
print(repr(struct1))
# 3. actual result:
# StructType(List(StructField(f1,StringType,true)))
# 4. try to use result in code
struct2 = StructType(List(StructField(f1,StringType,true)))
# 5. get bunch of errors
# Unresolved reference 'List'
# Unresolved reference 'f1'
# StringType is class, not constructed object
# Unresolved reference 'true'

Attachments

Issue Links

links to

[Github] Pull Request #34320 (crflynn)

Activity

People

Assignee:: Romi Kuntsman

Reporter:: Romi Kuntsman

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/Nov/16 09:05

Updated:: 23/Mar/22 14:01

Resolved:: 23/Mar/22 14:01