Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.0.0
-
None
-
None
-
Patch Available
Description
Added a '-force' option to the 'bin/nutch crawl' command line. With this option, one can crawl and recrawl in the following manner:
bin/nutch crawl urls -dir crawl -depth 2 -topN 10 -threads 5 bin/nutch crawl urls -dir crawl -depth 2 -topN 10 -threads 5 -force
This option can be used for the first crawl too:
bin/nutch crawl urls -dir crawl -depth 2 -topN 10 -threads 5 -force bin/nutch crawl urls -dir crawl -depth 2 -topN 10 -threads 5 -force
If one tries to crawl without the -force option when the crawl directory already exists, he/she finds a small warning along with the error message:
# bin/nutch crawl urls -dir crawl -depth 2 -topN 10 -threads 5
Exception in thread "main" java.lang.RuntimeException: crawl already
exists. Add -force option to recrawl.
at org.apache.nutch.crawl.Crawl.main(Crawl.java:89)