Description
Having
case class R(id: String) val ds = spark.createDataset(Seq(R("1")))
This works:
scala> ds.withColumn("n", ds.col("id")) res16: org.apache.spark.sql.DataFrame = [id: string, n: string]
but when we map over ds it fails:
scala> ds.withColumn("n", ds.map(a => a).col("id")) org.apache.spark.sql.AnalysisException: resolved attribute(s) id#55 missing from id#4 in operator !Project [id#4, id#55 AS n#57];; !Project [id#4, id#55 AS n#57] +- LocalRelation [id#4] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:347) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2884) at org.apache.spark.sql.Dataset.select(Dataset.scala:1150) at org.apache.spark.sql.Dataset.withColumn(Dataset.scala:1905) ... 48 elided