Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
1.6.1
-
None
-
None
-
CentOS
Description
On applying unionAll operation between A and B dataframes, they both has same schema but in different order and hence the result has column value mapping changed.
Repro:
A.show() +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+ |tag|year_day|tm_hour|tm_min|tm_sec|dtype|time|tm_mday|tm_mon|tm_yday|tm_year|value| +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+ +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+ B.show() +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+ |dtype| tag| time|tm_hour|tm_mday|tm_min|tm_mon|tm_sec|tm_yday|tm_year| value|year_day| +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+ | F|C_FNHXUT701Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F|C_FNHXUDP713.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F| C_FNHXUT718.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F|C_FNHXUT703Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F|C_FNHXUR716A.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F|C_FNHXUT803Z.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F| C_FNHXUT728.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F| C_FNHXUR806.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+ A = A.unionAll(B) A.show() +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+ |tag| year_day| tm_hour|tm_min|tm_sec|dtype|time|tm_mday|tm_mon|tm_yday|tm_year| value| +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+ | F|C_FNHXUT701Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275| 2015| 1.2345|2015275.0| | F|C_FNHXUDP713.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015| 1.2345|2015275.0| | F| C_FNHXUT718.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015| 1.2345|2015275.0| | F|C_FNHXUT703Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275| 2015| 1.2345|2015275.0| | F|C_FNHXUR716A.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275| 2015| 1.2345|2015275.0| | F|C_FNHXUT803Z.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015| 1.2345|2015275.0| | F| C_FNHXUT728.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015| 1.2345|2015275.0| | F| C_FNHXUR806.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015| 1.2345|2015275.0| +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+
On changing the schema of A according to B and doing unionAll works fine
C = A.select("dtype","tag","time","tm_hour","tm_mday","tm_min",”tm_mon”,"tm_sec","tm_yday","tm_year","value","year_day") A = C.unionAll(B) A.show() +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+ |dtype| tag| time|tm_hour|tm_mday|tm_min|tm_mon|tm_sec|tm_yday|tm_year| value|year_day| +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+ | F|C_FNHXUT701Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F|C_FNHXUDP713.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F| C_FNHXUT718.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F|C_FNHXUT703Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F|C_FNHXUR716A.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F|C_FNHXUT803Z.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F| C_FNHXUT728.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| | F| C_FNHXUR806.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| 2015|1.2345| 2015275| +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
Attachments
Issue Links
- is duplicated by
-
SPARK-20761 Union uses column order rather than schema
- Resolved
- is related to
-
SPARK-22335 Union for DataSet uses column order instead of types for union
- Resolved
- relates to
-
SPARK-9813 Incorrect UNION ALL behavior
- Resolved
-
SPARK-9874 UnionAll operation on DataFrame doesn't check for column names
- Resolved