Description
Currently the CrawlDbReader tool, when invoked without any command line arguments helps us as follows
[mdeploy@crawl local]$ ./bin/nutch readdb Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn> <out_dir> [<min>] | -url <url>) <crawldb> directory name where crawldb is located -stats [-sort] print overall statistics to System.out [-sort] list status sorted by host -dump <out_dir> [-format normal|csv|crawldb] dump the whole db to a text file in <out_dir> [-format csv] dump in Csv format [-format normal] dump in standard format (default option) [-format crawldb] dump as CrawlDB [-regex <expr>] filter records with expression [-retry <num>] minimum retry count [-status <status>] filter records by CrawlDatum status -url <url> print information on <url> to System.out -topN <nnnn> <out_dir> [<min>] dump top <nnnn> urls sorted by score to <out_dir> [<min>] skip records with scores below this value. This can significantly improve performance.
The code that bothers me is
-stats [-sort] print overall statistics to System.out
[-sort] list status sorted by host
The inclusion of the double -sort is not necessary or required.
Having looked through the code there is no other optional flag which we can substitute for the second one (which I thought may lead to this being a placeholder for something else) therefore we can just remove it.