Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2947

Fetcher: keep state of empty fetch queues unless queue feeder is finished

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.18
    • 1.19
    • fetcher
    • None
    • Patch Available

    Description

      If a fetch queue is empty (containing no fetch items) it may be removed from the list of queues. This also remove the state of a fetch queue, namely the next fetch time and the exception counter. If the queue feeder is still active it may happened that the same queue (i.e. associated with the same host/domain/IP) removed before is created again. In this case, certain aspects of fetcher politeness cannot be guaranteed anymore:

      • the fetch delay (via earliest next fetch time) and
      • the mechanism to block fetching from the same host/domain/IP with too many exceptions (NUTCH-769).

      The issue was observed while verifying NUTCH-2946 in the fetcher logs:

      ... 10:19:16,912 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 10:20:16,250 * queue foo.bar >> delayed next fetch by 79248 ms after 2 exceptions in queue
      ... 10:21:52,675 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 10:25:40,931 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 10:27:45,066 * queue foo.bar >> delayed next fetch by 79248 ms after 2 exceptions in queue
      ... 10:29:40,407 * queue foo.bar >> delayed next fetch by 100000 ms after 3 exceptions in queue
      ... 10:41:48,870 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 10:47:54,946 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 10:52:46,792 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 10:57:43,470 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 11:01:12,220 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 11:04:24,621 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 11:18:40,398 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 11:21:09,437 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 11:34:36,052 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 11:39:17,898 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 11:40:35,472 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 11:50:34,224 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 11:51:27,547 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 11:53:04,783 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 11:54:04,404 * queue foo.bar >> delayed next fetch by 79248 ms after 2 exceptions in queue
      ... 11:55:38,232 * queue foo.bar >> delayed next fetch by 100000 ms after 3 exceptions in queue
      ... 11:57:37,942 * queue foo.bar >> delayed next fetch by 116096 ms after 4 exceptions in queue
      ... 12:01:08,619 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      ... 12:02:35,985 * queue foo.bar >> delayed next fetch by 50000 ms after 1 exceptions in queue
      

      Attachments

        Issue Links

          Activity

            People

              snagel Sebastian Nagel
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: