Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16473

BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersStop watchingWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.1, 2.0.0
    • 2.1.1, 2.2.0
    • ML, MLlib
    • None
    • AWS EC2 linux instance.

    Description

      Hello ,

      I am using apache spark 1.6.1.
      I am executing bisecting k means algorithm on a specific dataset .
      Dataset details :-
      K=100,
      input vector =100K*100k
      Memory assigned 16GB per node ,
      number of nodes =2.

      Till K=75 it os working fine , but when I set k=100 , it fails with java.util.NoSuchElementException: key not found.

      I suspect it is failing because of lack of some resources , but somehow exception does not convey anything as why this spark job failed.

      Please can someone point me to root cause of this exception , why it is failing.

      This is the exception stack-trace:-

      java.util.NoSuchElementException: key not found: 166 
              at scala.collection.MapLike$class.default(MapLike.scala:228) 
              at scala.collection.AbstractMap.default(Map.scala:58) 
              at scala.collection.MapLike$class.apply(MapLike.scala:141) 
              at scala.collection.AbstractMap.apply(Map.scala:58) 
              at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
              at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
              at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
              at scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231) 
              at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) 
              at scala.collection.immutable.List.foldLeft(List.scala:84) 
              at scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125) 
              at scala.collection.immutable.List.reduceLeft(List.scala:84) 
              at scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
              at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
              at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337) 
              at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334) 
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
              at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
      

      Issue is that , it is failing but not giving any explicit message as to why it failed.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            imatiach Ilya Matiach Assign to me
            alokob.be@gmail.com Alok Bhandari
            Votes:
            0 Vote for this issue
            Watchers:
            4 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment