Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2339

Nutch does not fetch documents with the -all argument

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Auto Closed
    • 2.3.1
    • 2.5
    • nutchNewbie
    • None
    • Nutch 2.3.1 + Hadoop 2.7.1

    • Important

    Description

      I have deployed Nutch on the hadoop server. And whenever I check the count I get a humongous amount of docs with the status whereas very little amount of documents as compared to it with status 2.
      The statistics are as follows:

      { "status" : null, "count" : 16 } { "status" : 1, "count" : 358437 } { "status" : 2, "count" : 92021 } { "status" : 3, "count" : 7354 } { "status" : 4, "count" : 2807 } { "status" : 5, "count" : 4042 } { "status" : 34, "count" : 2767 } { "status" : 38, "count" : 229 }

      For successful fetching of status 1 documents, I have to run the command separately,then it starts fetching the status 1 documents. Is there any fix for this problem?

      Attachments

        Activity

          People

            Unassigned Unassigned
            shubham.gupta Shubham Gupta
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: