Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27191

union of dataframes depends on order of the columns in 2.4.0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Bug
    • 2.4.0
    • 2.3.0
    • SQL
    • None

    Description

      Thought this issue was resolved in 2.3.0 according to https://issues.apache.org/jira/browse/SPARK-22335 but I still faced this in 2.4.0.

      >>> df_1 = spark.createDataFrame([["1aa", "1bbbbbbb"]], ["col1", "col2"])
      >>> df_1.show()
      +----+--------+
      |col1| col2|
      +----+--------+
      | 1aa|1bbbbbbb|
      +----+--------+
      
      >>> df_2 = spark.createDataFrame([["2bbbbbbb", "2aa"]], ["col2", "col1"])
      >>> df_2.show()
      +--------+----+
      | col2|col1|
      +--------+----+
      |2bbbbbbb| 2aa|
      +--------+----+
      
      >>> df_u = df_1.union(df_2)
      >>> df_u.show()
      +--------+--------+
      | col1| col2|
      +--------+--------+
      | 1aa|1bbbbbbb|
      |2bbbbbbb| 2aa|
      +--------+--------+
      
      >>> spark.version
      '2.4.0'
      >>>
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mrinal10449 Mrinal Kanti Sardar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: