Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-721

Fetcher2 Slow

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 1.1
    • fetcher
    • None
    • Fedora Core r6, Kernel 2.6.22-14, jdk1.6.0_12

    Description

      Fetcher2 fetches far more slowly than Fetcher1.

      Config options:
      fetcher.threads.fetch = 80
      fetcher.threads.per.host = 80
      fetcher.server.delay = 0
      generate.max.per.host = 1

      With a queue size of ~40,000, the result is:

      activeThreads=80, spinWaiting=79, fetchQueues.totalSize=0

      with maybe a download of 1 page per second.

      Runing with -noParse makes little difference.

      CPU load average is around 0.2. With Fetcher1 CPU load is around 2.0 - 3.0

      Hosts already cached by local caching NS appear to download quickly upon a re-fetch, so possible issue relating to NS lookups, however all things being equal Fetcher1 runs fast without pre-caching hosts.

      Attachments

        1. crawl_generate.tar.gz
          387 kB
          Roger Dunk
        2. NUTCH-721.patch
          1 kB
          Julien Nioche
        3. nutch-site.xml
          1 kB
          Roger Dunk

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dogacan Dogacan Guney
            rogerd Roger Dunk
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment