-
Type:
Improvement
-
Status: Closed
-
Priority:
Minor
-
Resolution: Fixed
-
Affects Version/s: 1.0.0
-
Fix Version/s: 1.0.0
-
Component/s: None
-
Labels:None
-
Patch Info:Patch Available
Added a '-force' option to the 'bin/nutch crawl' command line. With this option, one can crawl and recrawl in the following manner:
bin/nutch crawl urls -dir crawl -depth 2 -topN 10 -threads 5 bin/nutch crawl urls -dir crawl -depth 2 -topN 10 -threads 5 -force
This option can be used for the first crawl too:
bin/nutch crawl urls -dir crawl -depth 2 -topN 10 -threads 5 -force bin/nutch crawl urls -dir crawl -depth 2 -topN 10 -threads 5 -force
If one tries to crawl without the -force option when the crawl directory already exists, he/she finds a small warning along with the error message:
# bin/nutch crawl urls -dir crawl -depth 2 -topN 10 -threads 5
Exception in thread "main" java.lang.RuntimeException: crawl already
exists. Add -force option to recrawl.
at org.apache.nutch.crawl.Crawl.main(Crawl.java:89)