Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-696

Timeout for Parser

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • nutchbase, 1.2, nutchgora
    • parser
    • None
    • Patch Available

    Description

      I found that the parsing sometimes crashes due to a problem on a specific document, which is a bit of a shame as this blocks the rest of the segment and Hadoop ends up finding that the node does not respond. I was wondering about whether it would make sense to have a timeout mechanism for the parsing so that if a document is not parsed after a time t, it is simply treated as an exception and we can get on with the rest of the process.

      Does that make sense? Where do you think we should implement that, in ParseUtil?

      Attachments

        1. timeout.patch
          3 kB
          Andrzej Bialecki

        Issue Links

          Activity

            People

              jnioche Julien Nioche
              jnioche Julien Nioche
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: