Details
Description
I found that the parsing sometimes crashes due to a problem on a specific document, which is a bit of a shame as this blocks the rest of the segment and Hadoop ends up finding that the node does not respond. I was wondering about whether it would make sense to have a timeout mechanism for the parsing so that if a document is not parsed after a time t, it is simply treated as an exception and we can get on with the rest of the process.
Does that make sense? Where do you think we should implement that, in ParseUtil?
Attachments
Attachments
Issue Links
- is related to
-
NUTCH-700 Neko1.9.11 goes into a loop
- Closed