Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.1
    • Component/s: fetcher
    • Labels:
      None
    • Environment:

      Fedora Core r6, Kernel 2.6.22-14, jdk1.6.0_12

      Description

      Fetcher2 fetches far more slowly than Fetcher1.

      Config options:
      fetcher.threads.fetch = 80
      fetcher.threads.per.host = 80
      fetcher.server.delay = 0
      generate.max.per.host = 1

      With a queue size of ~40,000, the result is:

      activeThreads=80, spinWaiting=79, fetchQueues.totalSize=0

      with maybe a download of 1 page per second.

      Runing with -noParse makes little difference.

      CPU load average is around 0.2. With Fetcher1 CPU load is around 2.0 - 3.0

      Hosts already cached by local caching NS appear to download quickly upon a re-fetch, so possible issue relating to NS lookups, however all things being equal Fetcher1 runs fast without pre-caching hosts.

        Attachments

        1. nutch-site.xml
          1 kB
          Roger Dunk
        2. NUTCH-721.patch
          1 kB
          Julien Nioche
        3. crawl_generate.tar.gz
          387 kB
          Roger Dunk

          Activity

            People

            • Assignee:
              dogacan Doğacan Güney
              Reporter:
              rogerd Roger Dunk
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: