Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.9.0
-
None
-
None
Description
Upgrade Nutch to Hadoop 0.7, and replace all occurences of UTF8 with Text. UTF8 is deprecated and its use is discouraged due to its limitations.
This change will break API, in the sense that all third-party additions will have to be updated to use new APIs that use Text instead of UTF8 in method parameters.
This change also breaks backward compatibility of data in CrawlDb, LinkDb and segments. A tool to upgrade CrawlDb, LinkDb and segments can be created to facilitate the upgrade path.