Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2003

topN is not work correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Auto Closed
    • 2.3
    • 2.5
    • None
    • None

    Description

      I want to crawl top 1000 urls which are ordered by scores from webpage table. It doesnt work correctly.

      When I use topN parameter, it is divided by map task counts (topN/ maptaskcounts = maptasktopN) Every map tasks generate maptasktopN urls of map tasks. Assume as I have 25 map tasks and I set topN parameter as 1000 and maptasktopN is calculated as 40. As Result We dont have top 1000 highest scored urls, we have 1000 urls of generated 40 highest scored urls per 25 map tasks.

      Attachments

        Activity

          People

            Unassigned Unassigned
            talat Talat Uyarer
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: