Description
[Starting a new thread from https://issues.apache.org/jira/browse/NUTCH-874]
One of the plugins which hasn't been ported yet is the feed parser. We could rely on the one we recently added to Tika, knowing that there is a substantial difference in the sense that the Tika feed parser generates a simple XHTML representation of the document where the feeds are simply represented as anchors whereas the Nutch version created new documents for each feed.
There is also the parse-rss plugin in Nutch which is quite similar - what's the difference with the feed one again? Since the Tika parser would handle all sorts of feed formats why not simply rely on it?
Any thoughts on this?