Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1297

it is better for fetchItemQueues to select items from greater queues first

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 1.4
    • None
    • fetcher
    • Patch Available

    Description

      there is a situation that if we have multiple hosts in fetch, and size of hosts were different, large hosts have a long delay until the getFetchItem() in FetchItemQueues class select a url from them, so we can give them more priority.
      for example if we have 10 url from host1 and 1000 url from host2, and have 5 threads, if all threads first selected from host1, we had more delay on fetch rather than a situation that threads first selected from host2, and when host 2 was busy, then selected from host1.

      Attachments

        1. NUTCH-1297.patch
          3 kB
          behnam nikbakht

        Issue Links

          Activity

            People

              Unassigned Unassigned
              behnam.nikbakht behnam nikbakht
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: