Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16473

BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.1, 2.0.0
    • Fix Version/s: 2.1.1, 2.2.0
    • Component/s: ML, MLlib
    • Labels:
      None
    • Environment:

      AWS EC2 linux instance.

      Description

      Hello ,

      I am using apache spark 1.6.1.
      I am executing bisecting k means algorithm on a specific dataset .
      Dataset details :-
      K=100,
      input vector =100K*100k
      Memory assigned 16GB per node ,
      number of nodes =2.

      Till K=75 it os working fine , but when I set k=100 , it fails with java.util.NoSuchElementException: key not found.

      I suspect it is failing because of lack of some resources , but somehow exception does not convey anything as why this spark job failed.

      Please can someone point me to root cause of this exception , why it is failing.

      This is the exception stack-trace:-

      java.util.NoSuchElementException: key not found: 166 
              at scala.collection.MapLike$class.default(MapLike.scala:228) 
              at scala.collection.AbstractMap.default(Map.scala:58) 
              at scala.collection.MapLike$class.apply(MapLike.scala:141) 
              at scala.collection.AbstractMap.apply(Map.scala:58) 
              at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
              at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
              at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
              at scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231) 
              at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) 
              at scala.collection.immutable.List.foldLeft(List.scala:84) 
              at scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125) 
              at scala.collection.immutable.List.reduceLeft(List.scala:84) 
              at scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
              at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
              at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337) 
              at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334) 
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
              at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
      

      Issue is that , it is failing but not giving any explicit message as to why it failed.

        Attachments

          Activity

            People

            • Assignee:
              imatiach Ilya Matiach
              Reporter:
              alokob.be@gmail.com Alok Bhandari
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: