Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19044

PySpark dropna() can fail with AnalysisException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • None
    • None
    • PySpark, SQL

    Description

      In PySpark, the following fails with an AnalysisException:

      v1 = spark.range(10)
      v2 = v1.crossJoin(v1)
      v2.dropna()
      
      AnalysisException: u"Reference 'id' is ambiguous, could be: id#66L, id#69L.;"
      

      However, the equivalent Scala code works fine:

      val v1 = spark.range(10)
      val v2 = v1.crossJoin(v1)
      v1.na.drop()
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: