1. Nutch




Takes a flat file of URLs and adds them to the crawldb as pages to be crawled

Issues: Unresolved

Key Summary Due Date
Bug NUTCH-1472 InvalidRequestException(why:(String didn't validate.) [webpage][f][ts] failed validation)
Improvement NUTCH-1712 Use MultipleInputs in Injector to make it a single mapreduce job
Bug NUTCH-1746 OutOfMemoryError in Mappers

View Issues

Issues: Updated recently

Key Summary Updated
Bug NUTCH-1938 Unable to load realm info from SCDynamicStore
Bug NUTCH-1746 OutOfMemoryError in Mappers
Improvement NUTCH-1763 Improving comments on the Injector Class

View Issues

Versions: Unreleased

Name Release date
Unreleased 2.4  
Unreleased 1.10  
Unreleased 1.11  
Unreleased 2.3.1