Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25072

PySpark custom Row class can be given extra parameters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.2.0
    • 2.4.0
    • PySpark
    • None

    Description

      When a custom Row class is made in PySpark, it is possible to provide the constructor of this class with more parameters than there are columns. These extra parameters affect the value of the Row, but are not part of the repr or str output, making it hard to debug errors due to these "invisible" values. The hidden values can be accessed through integer-based indexing though.

      Some examples:

      In [69]: RowClass = Row("column1", "column2")
      
      In [70]: RowClass(1, 2) == RowClass(1, 2)
      Out[70]: True
      
      In [71]: RowClass(1, 2) == RowClass(1, 2, 3)
      Out[71]: False
      
      In [75]: RowClass(1, 2, 3)
      Out[75]: Row(column1=1, column2=2)
      
      In [76]: RowClass(1, 2)
      Out[76]: Row(column1=1, column2=2)
      
      In [77]: RowClass(1, 2, 3).asDict()
      Out[77]: {'column1': 1, 'column2': 2}
      
      In [78]: RowClass(1, 2, 3)[2]
      Out[78]: 3
      
      In [79]: repr(RowClass(1, 2, 3))
      Out[79]: 'Row(column1=1, column2=2)'
      
      In [80]: str(RowClass(1, 2, 3))
      Out[80]: 'Row(column1=1, column2=2)'
      

      Attachments

        Activity

          People

            XuanYuan Yuanjian Li
            dutch_gecko Jan-Willem van der Sijp
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: