Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
1.3
-
None
-
Ubuntu Linux 10.04 server
JDK 1.6
Nutch 1.3
Solr 3.1.0
Description
Note: When using protocol-http I am able to update solr effortlessly.
To test this I have a single pdf file that I am trying to index in my urls directory.
I execute:
bin/nutch crawl urls
Output:
solrUrl is not set, indexing will be skipped...
crawl started in: crawl-20110805151045
rootUrlDir = urls
threads = 10
depth = 5
solrUrl=null
Injector: starting at 2011-08-05 15:10:45
Injector: crawlDb: crawl-20110805151045/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2011-08-05 15:10:48, elapsed: 00:00:02
Generator: starting at 2011-08-05 15:10:48
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: crawl-20110805151045/segments/20110805151050
Generator: finished at 2011-08-05 15:10:51, elapsed: 00:00:03
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2011-08-05 15:10:51
Fetcher: segment: crawl-20110805151045/segments/20110805151050
Fetcher: threads: 10
QueueFeeder finished: total 1 records + hit by time limit :0
fetching file:///home/nutch/nutch-1.3/runtime/local/indexdir/Altec.pdf
-finishing thread FetcherThread, activeThreads=9
-finishing thread FetcherThread, activeThreads=8
-finishing thread FetcherThread, activeThreads=7
-finishing thread FetcherThread, activeThreads=6
-finishing thread FetcherThread, activeThreads=5
-finishing thread FetcherThread, activeThreads=4
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2011-08-05 15:10:53, elapsed: 00:00:02
ParseSegment: starting at 2011-08-05 15:10:53
ParseSegment: segment: crawl-20110805151045/segments/20110805151050
ParseSegment: finished at 2011-08-05 15:10:56, elapsed: 00:00:03
CrawlDb update: starting at 2011-08-05 15:10:56
CrawlDb update: db: crawl-20110805151045/crawldb
CrawlDb update: segments: [crawl-20110805151045/segments/20110805151050]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-08-05 15:10:57, elapsed: 00:00:01
Generator: starting at 2011-08-05 15:10:57
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=1 - no more URLs to fetch.
LinkDb: starting at 2011-08-05 15:10:58
LinkDb: linkdb: crawl-20110805151045/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment: file:/home/nutch/nutch-1.3/runtime/local/crawl-20110805151045/segments/20110805151050
LinkDb: finished at 2011-08-05 15:10:59, elapsed: 00:00:01
crawl finished: crawl-20110805151045
Then with a clean solr index (stats output from stats.jsp below):
searcherName : Searcher@14dd758 main
caching : true
numDocs : 0
maxDoc : 0
reader : SolrIndexReader
readerDir : org.apache.lucene.store.NIOFSDirectory@/home/solr/apache-solr-3.1.0/example/solr/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@987197
indexVersion : 1312575204101
openedAt : Fri Aug 05 15:13:24 CDT 2011
registeredAt : Fri Aug 05 15:13:24 CDT 2011
warmupTime : 0
I then execute:
bin/nutch solrindex http://localhost:8983/solr/ crawl-20110805151045/crawldb/ crawl-20110805151045/linkdb/ crawl-20110805151045/segments/*
bin/nutch output:
SolrIndexer: starting at 2011-08-05 15:15:48
SolrIndexer: finished at 2011-08-05 15:15:50, elapsed: 00:00:01
solr output:
Aug 5, 2011 3:15:50 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@15f1f9c main
Aug 5, 2011 3:15:50 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@15f1f9c main from Searcher@14dd758 main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@15f1f9c main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@15f1f9c main from Searcher@14dd758 main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@15f1f9c main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@15f1f9c main from Searcher@14dd758 main
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=1,evictions=0,size=1,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@15f1f9c main
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@15f1f9c main from Searcher@14dd758 main
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@15f1f9c main
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Aug 5, 2011 3:15:50 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@15f1f9c main
Aug 5, 2011 3:15:50 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Aug 5, 2011 3:15:50 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher Searcher@15f1f9c main
Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing Searcher@14dd758 main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=1,evictions=0,size=1,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Aug 5, 2011 3:15:50 PM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {commit=} 0 8
Aug 5, 2011 3:15:50 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={waitSearcher=true&waitFlush=true&wt=javabin&commit=true&version=2} status=0 QTime=8
output from stats.jsp:
stats:
searcherName : Searcher@15f1f9c main
caching : true
numDocs : 0
maxDoc : 0
reader : SolrIndexReader{this=1ee148b,r=ReadOnlyDirectoryReader@1ee148b,refCnt=1,segments=0}
readerDir : org.apache.lucene.store.NIOFSDirectory@/home/solr/apache-solr-3.1.0/example/solr/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@987197
indexVersion : 1312575204101
openedAt : Fri Aug 05 15:15:50 CDT 2011
registeredAt : Fri Aug 05 15:15:50 CDT 2011
warmupTime : 2
Attachments
Issue Links
- duplicates
-
NUTCH-1483 Can't crawl filesystem with protocol-file plugin
- Closed
- is duplicated by
-
NUTCH-1483 Can't crawl filesystem with protocol-file plugin
- Closed