Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30941

PySpark Row can be instantiated with duplicate field names

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0
    • Fix Version/s: 2.4.6, 3.0.0
    • Component/s: PySpark
    • Labels:

      Description

      It is possible to create a Row that has fields with the same name when calling `collect()` after a join. Given that the Row constructor itself doesn't allow this, this seems to be undesired behavior.

      This can possibly cause correctness issues because different ways of getting values produce different results: _get_item_ will return the leftmost value, while asDict() will return the rightmost value (because the former uses an index search and the latter uses a dictionary generator).

      >>> manual_output_row = Row(a=1, b=1, b=2)
      {{ File "<stdin>", line 1}}
      SyntaxError: keyword argument repeated

      >>> input_rows = Row(a=1, b=1), Row(a=1, b=2)
      >>> df1, df2 = (spark.createDataFrame([r]) for r in input_rows)
      >>> df3 = df1.join(df2, "a")
      >>> output_row = df3.collect()[0]
      >>> output_row
      Row(a=1, b=1, b=2)
      >>> output_row["b"]
      1
      >>> output_row.asDict()["b"]
      2 

      *SPARK 1.6.3*

      >>> from pyspark.sql.types import Row
      >>> input_rows = Row(a=1, b=1), Row(a=1, b=2)
      >>> df1, df2 = (sqlContext.createDataFrame([r]) for r in input_rows)
      >>> df3 = df1.join(df2, "a")
      >>> output_row = df3.collect()[0]
      >>> output_row
      Row(a=1, b=1, b=2)
      >>> output_row["b"]
      1
      >>> output_row.asDict()["b"]
      2
      >>> sc.version
      u'1.6.3'
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hyukjin.kwon Hyukjin Kwon
                Reporter:
                droher David Roher
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: