Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1906

Typo in CrawlDbReader command line help

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 1.9
    • Fix Version/s: 1.10
    • Component/s: crawldb
    • Labels:
      None

      Description

      Currently the CrawlDbReader tool, when invoked without any command line arguments helps us as follows

      [mdeploy@crawl local]$ ./bin/nutch readdb
      Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn> <out_dir> [<min>] | -url <url>)
      	<crawldb>	directory name where crawldb is located
      	-stats [-sort] 	print overall statistics to System.out
      		[-sort]	list status sorted by host
      	-dump <out_dir> [-format normal|csv|crawldb]	dump the whole db to a text file in <out_dir>
      		[-format csv]	dump in Csv format
      		[-format normal]	dump in standard format (default option)
      		[-format crawldb]	dump as CrawlDB
      		[-regex <expr>]	filter records with expression
      		[-retry <num>]	minimum retry count
      		[-status <status>]	filter records by CrawlDatum status
      	-url <url>	print information on <url> to System.out
      	-topN <nnnn> <out_dir> [<min>]	dump top <nnnn> urls sorted by score to <out_dir>
      		[<min>]	skip records with scores below this value.
      			This can significantly improve performance.
      

      The code that bothers me is

      	-stats [-sort] 	print overall statistics to System.out
      		[-sort]	list status sorted by host
      

      The inclusion of the double -sort is not necessary or required.
      Having looked through the code there is no other optional flag which we can substitute for the second one (which I thought may lead to this being a placeholder for something else) therefore we can just remove it.

        Attachments

          Activity

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              lewismc Lewis John McGibbney

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment