Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17237

DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.3, 2.1.1, 2.2.0
    • SQL

    Description

      I am trying to run a pivot transformation which I ran on a spark1.6 cluster,
      namely

      sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c")
      res1: org.apache.spark.sql.DataFrame = [a: int, b: int, c: int]

      scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0)
      res2: org.apache.spark.sql.DataFrame = [a: int, 3_count(c): bigint, 3_avg(c): double, 4_count(c): bigint, 4_avg(c): double]

      scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0).show
      -------------------------------

      a 3_count(c) 3_avg(c) 4_count(c) 4_avg(c)

      -------------------------------

      2 1 4.0 0 0.0
      3 0 0.0 1 5.0

      -------------------------------

      after upgrade the environment to spark2.0, got an error while executing .na.fill method

      scala> sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c")
      res3: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field]

      scala> res3.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0)
      org.apache.spark.sql.AnalysisException: syntax error in attribute name: `3_count(`c`)`;
      at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:103)
      at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:113)
      at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168)
      at org.apache.spark.sql.Dataset.resolve(Dataset.scala:218)
      at org.apache.spark.sql.Dataset.col(Dataset.scala:921)
      at org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:411)
      at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:162)
      at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:159)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
      at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
      at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
      at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:159)
      at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:149)
      at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134)

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            maropu Takeshi Yamamuro
            tintinlotus Jiang Qiqi
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment