[SPARK-30065] Unable to drop na with duplicate columns - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
Fix Version/s: 2.4.5, 3.0.0
Component/s: SQL
Labels:
None

Description

Trying to drop rows with null values fails even when no columns are specified. This should be allowed:

scala> val left = Seq(("1", null), ("3", "4")).toDF("col1", "col2")
left: org.apache.spark.sql.DataFrame = [col1: string, col2: string]

scala> val right = Seq(("1", "2"), ("3", null)).toDF("col1", "col2")
right: org.apache.spark.sql.DataFrame = [col1: string, col2: string]

scala> val df = left.join(right, Seq("col1"))
df: org.apache.spark.sql.DataFrame = [col1: string, col2: string ... 1 more field]

scala> df.show
+----+----+----+
|col1|col2|col2|
+----+----+----+
|   1|null|   2|
|   3|   4|null|
+----+----+----+


scala> df.na.drop("any")
org.apache.spark.sql.AnalysisException: Reference 'col2' is ambiguous, could be: col2, col2.;
  at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:240)

Attachments

Issue Links

relates to

SPARK-29890 Unable to fill na with 0 with duplicate columns

Resolved

links to

GitHub Pull Request #26700

GitHub Pull Request #27411

Activity

People

Assignee:: Terry Kim

Reporter:: Terry Kim

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 28/Nov/19 00:29

Updated:: 08/Jun/20 16:14

Resolved:: 02/Dec/19 04:28