Description
When using Python's repr() on an object, the expected result is a string that Python can evaluate to construct the object.
See: https://docs.python.org/2/library/functions.html#func-repr
However, when getting a DataFrame schema in PySpark, the code (in "_repr()_" overload methods) returns the string representation for Scala, rather than for Python.
Relevant code in PySpark:
https://github.com/apache/spark/blob/5f02d2e5b4d37f554629cbd0e488e856fffd7b6b/python/pyspark/sql/types.py#L442
Python Code:
# 1. define object struct1 = StructType([StructField("f1", StringType(), True)]) # 2. print representation, expected to be like above print(repr(struct1)) # 3. actual result: # StructType(List(StructField(f1,StringType,true))) # 4. try to use result in code struct2 = StructType(List(StructField(f1,StringType,true))) # 5. get bunch of errors # Unresolved reference 'List' # Unresolved reference 'f1' # StringType is class, not constructed object # Unresolved reference 'true'