Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
1.5.0, 1.5.1
-
None
-
Tested with Spark 1.5.0 and Spark 1.5.1
Description
I get an exception when joining a DataFrame with another DataFrame. The second DataFrame was created by performing an aggregation on the first DataFrame.
My complete workflow is:
- read the DataFrame
- apply an UDF on column "name"
- apply an UDF on column "surname"
- apply an UDF on column "birthDate"
- aggregate on "name" and re-join with the DF
- aggregate on "surname" and re-join with the DF
If I remove one step, the process completes normally.
Here is the exception:
Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved attribute(s) surname#20 missing from id#0,birthDate#3,name#10,surname#7 in operator !Project [id#0,birthDate#3,name#10,surname#20,UDF(birthDate#3) AS birthDate_cleaned#8]; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:37) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:154) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:49) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:103) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:102) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:102) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:102) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:102) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:102) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:49) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44) at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:914) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:132) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154) at org.apache.spark.sql.DataFrame.join(DataFrame.scala:553) at org.apache.spark.sql.DataFrame.join(DataFrame.scala:520) at TestCase2$.main(TestCase2.scala:51) at TestCase2.main(TestCase2.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
I'm attaching a test case that I tried with Spark 1.5.0 and 1.5.1. Please note it used to work with version 1.4.1
Attachments
Attachments
Issue Links
- duplicates
-
SPARK-14948 Exception when joining DataFrames derived form the same DataFrame
- In Progress
- is duplicated by
-
SPARK-14948 Exception when joining DataFrames derived form the same DataFrame
- In Progress
-
SPARK-20093 Exception when Joining dataframe with another dataframe generated by applying groupBy transformation on original one
- Resolved
- relates to
-
SPARK-20073 Unexpected Cartesian product when using eqNullSafe in join with a derived table
- Resolved