Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.7
-
None
-
None
Description
The NodeWalker used by the HeadingsParseFilter sometimes reports a NullPointerException.
2013-07-02 11:02:09,428 WARN parse.ParseUtil - Error parsing .... with org.apache.nutch.parse.tika.TikaParser@2c8b586a java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262) at java.util.concurrent.FutureTask.get(FutureTask.java:119) at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:162) at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:93) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:963) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:722) Caused by: java.lang.NullPointerException at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source) at org.apache.xerces.dom.ParentNode.item(Unknown Source) at org.apache.nutch.util.NodeWalker.nextNode(NodeWalker.java:75) at org.apache.nutch.parse.headings.HeadingsParseFilter.getElement(HeadingsParseFilter.java:84) at org.apache.nutch.parse.headings.HeadingsParseFilter.filter(HeadingsParseFilter.java:47) at org.apache.nutch.parse.HtmlParseFilters.filter(HtmlParseFilters.java:98) at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:210) at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:35) at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:24) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722)
This is strange because it only rarely fails and the nextNode() method checks hasNext() and there is no concurrent access if i'm correct.