Description
Felix Neutatz reported a NullPointerException in the MutableHashTable when running the ALS algorithm. The stack trace is the following:
Caused by: java.lang.NullPointerException at org.apache.flink.runtime.operators.hash.HashPartition.spillPartition(HashPartition.java:310) at org.apache.flink.runtime.operators.hash.MutableHashTable.spillPartition(MutableHashTable.java:1094) at org.apache.flink.runtime.operators.hash.MutableHashTable.insertBucketEntry(MutableHashTable.java:927) at org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:783) at org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508) at org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:544) at org.apache.flink.runtime.operators.hash.NonReusingBuildFirstHashMatchIterator.callWithNextKey(NonReusingBuildFirstHashMatchIterator.java:104) at org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:173) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745)
He produced this error on his local machine with the following code:
implicit val env = ExecutionEnvironment.getExecutionEnvironment val links = MovieLensUtils.readLinks(movieLensDir + "links.csv") val movies = MovieLensUtils.readMovies(movieLensDir + "movies.csv") val ratings = MovieLensUtils.readRatings(movieLensDir + "ratings.csv") val tags = MovieLensUtils.readTags(movieLensDir + "tags.csv") val ratingMatrix = ratings.map { r => (r.userId.toInt, r.movieId.toInt, r.rating) } val testMatrix = ratings.map { r => (r.userId.toInt, r.movieId.toInt) } val als = ALS() .setIterations(10) .setNumFactors(10) .setBlocks(150) als.fit(ratingMatrix) val result = als.predict(testMatrix) result.print val risk = als.empiricalRisk(ratingMatrix).collect().apply(0) println("Empirical risk: " + risk) env.execute()