Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2644

CrawlDbReader -dump ignores filter options

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.15
    • Fix Version/s: 1.16
    • Component/s: crawldb
    • Labels:
      None

      Description

      The CrawlDbReader ignores the filter options -status and -expr when dumping a crawldb:

      % bin/nutch readdb crawldb/ -dump cdb.dump -status 'db_fetched' -expr 'status == "db_fetched"'
      ...
      % grep '^Status:' cdb.dump/part-r-00000 | sort | uniq -c
           10 Status: 1 (db_unfetched)
           28 Status: 2 (db_fetched)
            1 Status: 3 (db_gone)
            1 Status: 4 (db_redir_temp)
            3 Status: 7 (db_duplicate)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                snagel Sebastian Nagel
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: