Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18621

PySQL SQL Types (aka Dataframa Schema) have __repr__() with Scala and not Python representation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.6.2, 2.0.2
    • 3.3.0
    • PySpark
    • None

    Description

      When using Python's repr() on an object, the expected result is a string that Python can evaluate to construct the object.
      See: https://docs.python.org/2/library/functions.html#func-repr

      However, when getting a DataFrame schema in PySpark, the code (in "_repr()_" overload methods) returns the string representation for Scala, rather than for Python.

      Relevant code in PySpark:
      https://github.com/apache/spark/blob/5f02d2e5b4d37f554629cbd0e488e856fffd7b6b/python/pyspark/sql/types.py#L442

      Python Code:

      # 1. define object
      struct1 = StructType([StructField("f1", StringType(), True)])
      # 2. print representation, expected to be like above
      print(repr(struct1))
      # 3. actual result:
      # StructType(List(StructField(f1,StringType,true)))
      # 4. try to use result in code
      struct2 = StructType(List(StructField(f1,StringType,true)))
      # 5. get bunch of errors
      # Unresolved reference 'List'
      # Unresolved reference 'f1'
      # StringType is class, not constructed object
      # Unresolved reference 'true'
      

      Attachments

        Activity

          People

            romik Romi Kuntsman
            romik Romi Kuntsman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: