Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Auto Closed
-
2.2, 2.3, 2.2.1, 2.3.1
-
Ubuntu 16.04 / Hadoop 2.7.1
-
Patch Available
-
Patch, Important
Description
Given a topN parameter (ie 10) the GeneratorJob will fail to generate anything new on the subsequent runs within the same process space.
To reproduce the issue submit the GeneratorJob twice one after another to the M/R framework. Second time will say it generated 0 URLs.
This issue is due to the usage of the static count field (org.apache.nutch.crawl.GeneratorReducer#count) to determine if the topN value has been reached.
Attachments
Attachments
Issue Links
- duplicates
-
NUTCH-2330 GeneratorJob does not generate anything on second run
- Closed