Description
if i take union of 2 datasets with similar schema, the output should remain same even if i change the sequence of columns while creating the dataset.
i am attaching the code snippet for details.
public class Person{ public String name; public String age; public Person(String name, String age) { this.name = name; this.age = age; } public String getName() {return name;} public void setName(String name) {this.name = name;} public String getAge() {return age;} public void setAge(String age) {this.age = age;} }
public class Test { public static void main(String arg[]) throws Exception { SparkSession spark = SparkConnection.getSpark(); List<Person> list1 = new ArrayList<>(); list1.add(new Person("kaushal", "25")); list1.add(new Person("aman", "26")); List<Person> list2 = new ArrayList<>(); list2.add(new Person("sapan", "25")); list2.add(new Person("yati", "26")); Dataset<Person> ds1 = spark.createDataset(list1, Encoders.bean(Person.class)); Dataset<Person> ds2 = spark.createDataset(list2, Encoders.bean(Person.class)); ds1.show(); ds2.show(); ds1.select("name","age").as(Encoders.bean(Person.class)).union(ds2).show(); } }
output :-
+---+-------+ |age| name| +---+-------+ | 25|kaushal| | 26| aman| +---+-------+ +---+-----+ |age| name| +---+-----+ | 25|sapan| | 26| yati| +---+-----+ +-------+-----+ | name| age| +-------+-----+ |kaushal| 25| | aman| 26| | 25|sapan| | 26| yati| +-------+-----+
Attachments
Issue Links
- relates to
-
SPARK-21043 Add unionByName API to Dataset
- Resolved
-
SPARK-19615 Provide Dataset union convenience for divergent schema
- Resolved