Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30065

Unable to drop na with duplicate columns

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
    • Fix Version/s: 2.4.5, 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      Trying to drop rows with null values fails even when no columns are specified. This should be allowed:

      scala> val left = Seq(("1", null), ("3", "4")).toDF("col1", "col2")
      left: org.apache.spark.sql.DataFrame = [col1: string, col2: string]
      
      scala> val right = Seq(("1", "2"), ("3", null)).toDF("col1", "col2")
      right: org.apache.spark.sql.DataFrame = [col1: string, col2: string]
      
      scala> val df = left.join(right, Seq("col1"))
      df: org.apache.spark.sql.DataFrame = [col1: string, col2: string ... 1 more field]
      
      scala> df.show
      +----+----+----+
      |col1|col2|col2|
      +----+----+----+
      |   1|null|   2|
      |   3|   4|null|
      +----+----+----+
      
      
      scala> df.na.drop("any")
      org.apache.spark.sql.AnalysisException: Reference 'col2' is ambiguous, could be: col2, col2.;
        at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:240)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                imback82 Terry Kim
                Reporter:
                imback82 Terry Kim
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: