Description
When extracting the host part of a URL fails, the Generator job fails because of a NPE in the SelectorReducer. This issue is reproducible if the CrawlDb contains an malformed URL, for example, a URL with an unsupported scheme (smb://).
Caused by: java.lang.NullPointerException at org.apache.nutch.crawl.Generator$SelectorReducer.reduce(Generator.java:439) at org.apache.nutch.crawl.Generator$SelectorReducer.reduce(Generator.java:300)
Attachments
Issue Links
- links to