Description
I have faced this issue when trying to index the entire title, just like the content, configuring its value on nutch-default.xml to -1 (indexer.max.title.length). I think the behavior should be the same as the content.
If you would like to fix it, just replace the line number 90:
if (title.length() > MAX_TITLE_LENGTH) { // truncate title if needed
by this one:
if (MAX_TITLE_LENGTH > -1 && title.length() > MAX_TITLE_LENGTH) { // truncate title if needed
Stack Trace:
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1937)
at org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Cheers.