Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.0.0
-
None
-
None
-
Patch Available
Description
This is a patch so that Nutch can be used with Hadoop 0.17.0. The patch is located at http://pastie.org/212001
The patch compiles and passes all current Nutch unit tests.
I have tested that the crawler side of Nutch (i.e. inject, generate, fetch, parse, merge w/crawldb) definetly works, but have not tested the lucene indexing part. It might work, but it might not.
NOTE - the two main bugs that had to be overcome were not noticed by any of the unit tests. The bugs only came up during actual testing. The bugs were:
1. Changes to the Hadoop Iterator
2. Addition of Serialization to MapReduce Framework