Description
I am using the NUTCH.Bat provided in one one of the thread. (i am not using CYGWIN) Whenever I try to fetch the Item, I am getting fetching failed "nullpointerexception"
I have a URL Directory. which has urls.txt file. there is only one entry in the file which is http://www.winzip.com/land_about.htm.
I have updated the crawl-urlfilter.txt with +^http://www.winzip.com/.
Is there any other settings I am missing?? Any help is greatly appreciated.
The command i used to start the crawl is
nutch crawl urls -dir crawlResults -depth 1
Here is my log
crawl started in: crawlResult
rootUrlDir = urls
threads = 10
depth = 1
Injector: starting
Injector: crawlDb: crawlResult/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: starting
Generator: segment: crawlResult/segments/20070110085314
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawlResult/segments/20070110085314
Fetcher: threads: 10
fetching http://www.winzip.com/land_about.htm
fetch of http://www.winzip.com/land_about.htm failed with: java.lang.NullPointerException
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawlResult/crawldb
CrawlDb update: segment: crawlResult/segments/20070110085314
CrawlDb update: Merging segment data into db.
CrawlDb update: done
LinkDb: starting
LinkDb: linkdb: crawlResult/linkdb
LinkDb: adding segment: crawlResult/segments/20070110085314
LinkDb: done
Indexer: starting
Indexer: linkdb: crawlResult/linkdb
Indexer: adding segment: crawlResult/segments/20070110085314
Optimizing index.
Indexer: done
Dedup: starting
Dedup: adding indexes in: crawlResult/indexes
Dedup: done
Adding crawlResult/indexes/part-00000
crawl finished: crawlResult