Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2328

GeneratorJob does not generate anything on second run

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Auto Closed
    • 2.2, 2.3, 2.2.1, 2.3.1
    • 2.5
    • generator
    • Ubuntu 16.04 / Hadoop 2.7.1

    • Patch Available
    • Patch, Important

    Description

      Given a topN parameter (ie 10) the GeneratorJob will fail to generate anything new on the subsequent runs within the same process space.
      To reproduce the issue submit the GeneratorJob twice one after another to the M/R framework. Second time will say it generated 0 URLs.
      This issue is due to the usage of the static count field (org.apache.nutch.crawl.GeneratorReducer#count) to determine if the topN value has been reached.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              arthur-evozon Arthur B
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 24h
                  24h
                  Remaining:
                  Remaining Estimate - 24h
                  24h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified