Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12556

Pyspark dataframe unionAll call accepts incorrect input

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.4.1
    • None
    • PySpark
    • None

    Description

      I actually encountered this problem with two dataframes that have 8 and 10 columns each. The below is a made up example that reproduces what I observed going wrong.

      Consider the two dataframes:

      df1:

      ---------------+

      id count

      ---------------+
      ---------------+

      df2:

      ----------------------

      id new_count count

      ----------------------

      1 4 6
      1 5 6
      3 6 6
      2 7 6

      ----------------------

      The call:

      df3 = df1.unionAll(df2)

      returns successfully with df3 containing 2 cloumns. However, some columns now have swapped values (with other columns). Based on my previous experience I would say that df3's count column will actually be the new_count column.

      I believe that this call should never complete successfully in the first place and should throw an exception instead.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              akshan Aravind B
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: