[NUTCH-428] NullPointerException - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.8.1
Fix Version/s: 0.9.0
Component/s: fetcher
Labels:
None
Environment:

Windows XP

Description

I am using the NUTCH.Bat provided in one one of the thread. (i am not using CYGWIN) Whenever I try to fetch the Item, I am getting fetching failed "nullpointerexception"
I have a URL Directory. which has urls.txt file. there is only one entry in the file which is http://www.winzip.com/land_about.htm.
I have updated the crawl-urlfilter.txt with +^http://www.winzip.com/.

Is there any other settings I am missing?? Any help is greatly appreciated.

The command i used to start the crawl is
nutch crawl urls -dir crawlResults -depth 1

Here is my log

crawl started in: crawlResult
rootUrlDir = urls
threads = 10
depth = 1
Injector: starting
Injector: crawlDb: crawlResult/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: starting
Generator: segment: crawlResult/segments/20070110085314
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawlResult/segments/20070110085314
Fetcher: threads: 10
fetching http://www.winzip.com/land_about.htm
fetch of http://www.winzip.com/land_about.htm failed with: java.lang.NullPointerException
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawlResult/crawldb
CrawlDb update: segment: crawlResult/segments/20070110085314
CrawlDb update: Merging segment data into db.
CrawlDb update: done
LinkDb: starting
LinkDb: linkdb: crawlResult/linkdb
LinkDb: adding segment: crawlResult/segments/20070110085314
LinkDb: done
Indexer: starting
Indexer: linkdb: crawlResult/linkdb
Indexer: adding segment: crawlResult/segments/20070110085314
Optimizing index.
Indexer: done
Dedup: starting
Dedup: adding indexes in: crawlResult/indexes
Dedup: done
Adding crawlResult/indexes/part-00000
crawl finished: crawlResult

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Piyush

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 10/Jan/07 14:56

Updated:: 18/Apr/07 15:44

Resolved:: 12/Jan/07 22:14