Details
Description
Trying to drop rows with null values fails even when no columns are specified. This should be allowed:
scala> val left = Seq(("1", null), ("3", "4")).toDF("col1", "col2") left: org.apache.spark.sql.DataFrame = [col1: string, col2: string] scala> val right = Seq(("1", "2"), ("3", null)).toDF("col1", "col2") right: org.apache.spark.sql.DataFrame = [col1: string, col2: string] scala> val df = left.join(right, Seq("col1")) df: org.apache.spark.sql.DataFrame = [col1: string, col2: string ... 1 more field] scala> df.show +----+----+----+ |col1|col2|col2| +----+----+----+ | 1|null| 2| | 3| 4|null| +----+----+----+ scala> df.na.drop("any") org.apache.spark.sql.AnalysisException: Reference 'col2' is ambiguous, could be: col2, col2.; at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:240)
Attachments
Issue Links
- relates to
-
SPARK-29890 Unable to fill na with 0 with duplicate columns
- Resolved
- links to