Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2644

CrawlDbReader -dump ignores filter options

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.15
    • 1.16
    • crawldb
    • None

    Description

      The CrawlDbReader ignores the filter options -status and -expr when dumping a crawldb:

      % bin/nutch readdb crawldb/ -dump cdb.dump -status 'db_fetched' -expr 'status == "db_fetched"'
      ...
      % grep '^Status:' cdb.dump/part-r-00000 | sort | uniq -c
           10 Status: 1 (db_unfetched)
           28 Status: 2 (db_fetched)
            1 Status: 3 (db_gone)
            1 Status: 4 (db_redir_temp)
            3 Status: 7 (db_duplicate)
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: