Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24213

Power Iteration Clustering in the SparkML throws exception, when the ID is IntType

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Done
    • 2.4.0
    • None
    • ML
    • None

    Description

      While running the code, PowerIterationClustering in spark ML throws exception.

      val data = spark.createDataFrame(Seq(
      (0, Array(1), Array(0.9)),
      (1, Array(2), Array(0.9)),
      (2, Array(3), Array(0.9)),
      (3, Array(4), Array(0.1)),
      (4, Array(5), Array(0.9))
      )).toDF("id", "neighbors", "similarities")
      
      val result = new PowerIterationClustering()
      .setK(2)
      .setMaxIter(10)
      .setInitMode("random")
      .transform(data)
      .select("id","prediction")
      
      org.apache.spark.sql.AnalysisException: cannot resolve '`prediction`' given input columns: [id, neighbors, similarities];;
      'Project [id#215, 'prediction]
      +- AnalysisBarrier
            +- Project [id#215, neighbors#216, similarities#217]
               +- Join Inner, (id#215 = id#234)
                  :- Project [_1#209 AS id#215, _2#210 AS neighbors#216, _3#211 AS similarities#217]
                  :  +- LocalRelation [_1#209, _2#210, _3#211]
                  +- Project [cast(id#230L as int) AS id#234]
                     +- LogicalRDD [id#230L, prediction#231], false
      
      	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
      	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:88)
      	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
      	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
      
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            shahid shahid
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment