Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-428

NullPointerException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.8.1
    • 0.9.0
    • fetcher
    • None
    • Windows XP

    Description

      I am using the NUTCH.Bat provided in one one of the thread. (i am not using CYGWIN) Whenever I try to fetch the Item, I am getting fetching failed "nullpointerexception"
      I have a URL Directory. which has urls.txt file. there is only one entry in the file which is http://www.winzip.com/land_about.htm.
      I have updated the crawl-urlfilter.txt with +^http://www.winzip.com/.

      Is there any other settings I am missing?? Any help is greatly appreciated.

      The command i used to start the crawl is
      nutch crawl urls -dir crawlResults -depth 1

      Here is my log

      crawl started in: crawlResult
      rootUrlDir = urls
      threads = 10
      depth = 1
      Injector: starting
      Injector: crawlDb: crawlResult/crawldb
      Injector: urlDir: urls
      Injector: Converting injected urls to crawl db entries.
      Injector: Merging injected urls into crawl db.
      Injector: done
      Generator: starting
      Generator: segment: crawlResult/segments/20070110085314
      Generator: Selecting best-scoring urls due for fetch.
      Generator: Partitioning selected urls by host, for politeness.
      Generator: done.
      Fetcher: starting
      Fetcher: segment: crawlResult/segments/20070110085314
      Fetcher: threads: 10
      fetching http://www.winzip.com/land_about.htm
      fetch of http://www.winzip.com/land_about.htm failed with: java.lang.NullPointerException
      Fetcher: done
      CrawlDb update: starting
      CrawlDb update: db: crawlResult/crawldb
      CrawlDb update: segment: crawlResult/segments/20070110085314
      CrawlDb update: Merging segment data into db.
      CrawlDb update: done
      LinkDb: starting
      LinkDb: linkdb: crawlResult/linkdb
      LinkDb: adding segment: crawlResult/segments/20070110085314
      LinkDb: done
      Indexer: starting
      Indexer: linkdb: crawlResult/linkdb
      Indexer: adding segment: crawlResult/segments/20070110085314
      Optimizing index.
      Indexer: done
      Dedup: starting
      Dedup: adding indexes in: crawlResult/indexes
      Dedup: done
      Adding crawlResult/indexes/part-00000
      crawl finished: crawlResult

      Attachments

        Activity

          People

            Unassigned Unassigned
            piyush111 Piyush
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: