Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
0.7
-
None
-
None
-
mahout-0.7
hadoop-0.20.205.0
Description
when using mahout ssvd job, Bt-job creates lots of spills to disk.
Those might be minimized by tuning hadoop io.sort.mb parameter.
However, when io.sort.mb is bigger than ~ 1100 , ie. 1500 I'm getting that exception:
java.io.IOException: Spill failed
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1029)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper$1.collect(BtJob.java:261)
at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper$1.collect(BtJob.java:255)
at org.apache.mahout.math.hadoop.stochasticsvd.SparseRowBlockAccumulator.flushBlock(SparseRowBlockAccumulator.java:65)
at org.apache.mahout.math.hadoop.stochasticsvd.SparseRowBlockAccumulator.collect(SparseRowBlockAccumulator.java:75)
at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.map(BtJob.java:158)
at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.map(BtJob.java:102)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: next value iterator failed
at org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:166)
at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$OuterProductCombiner.reduce(BtJob.java:322)
at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$OuterProductCombiner.reduce(BtJob.java:302)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1502)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:853)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1344)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:267)
at org.apache.mahout.math.Varint.readUnsignedVarInt(Varint.java:159)
at org.apache.mahout.math.hadoop.stochasticsvd.SparseRowBlockWritable.readFields(SparseRowBlockWritable.java:60)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
at org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:163)
... 7 more
by changing this value I've already managed to reduce spills from 100 (for default io.sort.mb value) to 10, disk usage dropped from around 7 gigabytes for my small data set to around 900 mb. repairing this issue might bring big performance improvements.
I've got lots of free ram, that's not some lack of memory issue.