Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.2.1
-
None
-
None
Description
I explained that I found a big in the 2.X HostDb.
I was looking into the code within Nutch 2.X HostDbUpdateReducer and
'think' I've discovered a bug in the way we output Host data.
https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/host/HostDbUpdateReducer.java#L87
I feel that the following code
host.getInlinks().put(new Utf8(outlink), new Utf8(Integer.toString(outlinkCount.getCount(outlink))));
should be changed to the following
host.getOutlinks().put(new Utf8(outlink), new Utf8(Integer.toString(outlinkCount.getCount(outlink))));
Notice the difference in population of Outlinks to Host instead of repeated Inlinks.