Details
Description
I have data in below JavaPairRDD :
JavaPairRDD<String,Tuple2<String,String>> MY_RDD;
I tried using below code:
Encoder<Tuple2<String, Tuple2<String,String>>> encoder2 =
Encoders.tuple(Encoders.STRING(), Encoders.tuple(Encoders.STRING(),Encoders.STRING()));
Dataset<Row> newDataSet = spark.createDataset(JavaPairRDD.toRDD(MY_RDD),encoder2).toDF("value1","value2");newDataSet.printSchema();
root
{{ |-- value1: string (nullable = true)}}
{{ |-- value2: struct (nullable = true)}}
{{ | |-- value: string (nullable = true)}}
{{ | |-- value: string (nullable = true)}}
But after creating a StackOverflow question ("https://stackoverflow.com/questions/50834145/javapairrdd-to-datasetrow-in-spark"), i got to know that values in tuple should have distinguish field names, where in this case its generating same name. Cause of this I cannot select specific column under value2.