Although Nutch does not support continuous crawling out of the box, and yes this is somehow doable using cron or even sometimes irrelevant due the size of the crawl its a nice feature to have.
This patch basically just adds a new parameter option to the bin/crawl script (
w|-wait) which adds a time to wait if the generator returns 0 (when no URLs are scheduled for fetching).
This new parameter has the NUMBER[SUFFIX] format, if no suffix is provided the amount of time is assumed to be in seconds. Other valid suffixes are:
s - second
m - minutes
h - hours
d - days
If a -1 value is passed to the parameter or its not used at all the default behaviour of exciting the script is used.