Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
1.6.0
-
None
-
None
Description
Spark Analyser is throwing the following exception in a specific scenario :
Exception :
org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing from asd#5,F2#4,F1#6,F2#7 in operator !Project asd#5,F1#3;
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
Code :
SparkClient.java
StructField[] fields = new StructField[2]; fields[0] = new StructField("F1", DataTypes.StringType, true, Metadata.empty()); fields[1] = new StructField("F2", DataTypes.StringType, true, Metadata.empty()); JavaRDD<Row> rdd = sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a", "b"))); DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new StructType(fields)); sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1"); DataFrame aliasedDf = sparkClient.getSparkHiveContext().sql("select F1 as asd, F2 from t1"); sparkClient.getSparkHiveContext().registerDataFrameAsTable(aliasedDf, "t2"); sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t3"); DataFrame join = aliasedDf.join(df, aliasedDf.col("F2").equalTo(df.col("F2")), "inner"); DataFrame select = join.select(aliasedDf.col("asd"), df.col("F1")); select.collect();
Observations :
- This issue is related to the Data Type of Fields of the initial Data Frame.(If the Data Type is not String, it will work.)
- It works fine if the data frame is registered as a temporary table and an sql (select a.asd,b.F1 from t2 a inner join t3 b on a.F2=b.F2) is written.
Attachments
Issue Links
- duplicates
-
SPARK-10925 Exception when joining DataFrames
- Resolved
- is duplicated by
-
SPARK-10925 Exception when joining DataFrames
- Resolved
-
SPARK-23677 Selecting columns from joined DataFrames with the same origin yields wrong results
- Resolved
- links to