Details
Description
In distributed mode CrawlDbReader / readdb -stats fails with a ClassCastException in the combiner:
17/12/08 04:57:13 INFO mapreduce.Job: Task Id : attempt_1512553291624_0022_m_000039_0, Status : FAILED Error: java.lang.ClassCastException: org.apache.hadoop.io.FloatWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.nutch.crawl.CrawlDbReader$CrawlDbStatCombiner.reduce(CrawlDbReader.java:296) at org.apache.nutch.crawl.CrawlDbReader$CrawlDbStatCombiner.reduce(CrawlDbReader.java:222) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1639) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1946) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1514) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:466) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
FloatWritables are used since NUTCH-2470, so that's when this bug was introduced.
Attachments
Issue Links
- incorporates
-
NUTCH-2297 CrawlDbReader -stats wrong values for earliest fetch time and shortest interval
- Closed
- supercedes
-
NUTCH-2297 CrawlDbReader -stats wrong values for earliest fetch time and shortest interval
- Closed
- links to