Details
Description
When listing multiple scoring filters in certain properties (listed below) in "nutch-site.xml", it is vital that no spaces/newlines/tabs are placed in front of the value content.
E.g.:
This is fine:
<value>org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value>
Either of these will generate an exception:
<value> org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value>
<value>
org.apache.nutch.scoring.opic.OPICScoringFilter
myFilter
</value>
Affects these properties in "nutch-site.xml":
- indexingfilter.order
- urlnormalizer.order
- urlfilter.order
- htmlparsefilter.order
- scoring.filter.order
Solution: replaced
{order.split("\\s+")}to
{order.trim().split("\\s+")}. Patch provided.