Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1773

Solr Indexer fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Not A Problem
    • 2.3
    • 2.3
    • indexer
    • None
    • Ubuntu 12.04 LTS, java version "1.7.0_55" - Hbase-0.90.6 (pseudo dist), Hadoop 1.2.1, Solr 4.6

    Description

      When using crawl script or solrindexer by itself (/bin/nutch solrindex) in localmode it fails with:

      hduser@bl4ck1c3:~/nutch-2.3/runtime/local$ bin/nutch solrindex TestCrawl18 -reindex
      IndexingJob: starting
      Active IndexWriters :
      SOLRIndexWriter
      solr.server.url : URL of the SOLR instance (mandatory)
      solr.commit.size : buffer size when sending to SOLR (default 1000)
      solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
      solr.auth : use authentication (default false)
      solr.auth.username : use authentication (default false)
      solr.auth : username for authentication
      solr.auth.password : password for authentication

      SolrIndexerJob: java.lang.IllegalStateException: Target host must not be null, or set in parameters.
      at org.apache.http.impl.client.DefaultRequestDirector.determineRoute(DefaultRequestDirector.java:787)
      at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:414)
      at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
      at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
      at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
      at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:393)
      at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
      at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
      at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
      at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146)
      at org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:146)
      at org.apache.nutch.indexer.IndexWriters.commit(IndexWriters.java:127)
      at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:171)
      at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:187)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:196)

      when using the new INDEX command it finishes, but nothing is added to Solr:

      hduser@bl4ck1c3:~/nutch-2.3/runtime/local$ bin/nutch index TestCrawl18 -reindex
      IndexingJob: starting
      Active IndexWriters :
      SOLRIndexWriter
      solr.server.url : URL of the SOLR instance (mandatory)
      solr.commit.size : buffer size when sending to SOLR (default 1000)
      solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
      solr.auth : use authentication (default false)
      solr.auth.username : use authentication (default false)
      solr.auth : username for authentication
      solr.auth.password : password for authentication

      Log shows:

      2014-05-13 03:01:13,781 INFO indexer.IndexingJob - IndexingJob: starting
      2014-05-13 03:01:14,108 INFO indexer.IndexingFilters - Adding org.apache.nutch.analysis.lang.LanguageIndexingFilter
      2014-05-13 03:01:14,109 INFO basic.BasicIndexingFilter - Maximum title length for indexing set to: 100
      2014-05-13 03:01:14,109 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
      2014-05-13 03:01:14,335 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.more.MoreIndexingFilter
      2014-05-13 03:01:14,336 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
      2014-05-13 03:01:14,336 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
      2014-05-13 03:01:14,620 WARN zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
      2014-05-13 03:01:14,768 WARN zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
      2014-05-13 03:01:14,968 WARN zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
      2014-05-13 03:01:15,243 WARN zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
      2014-05-13 03:01:15,276 WARN zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
      2014-05-13 03:01:15,326 WARN zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
      2014-05-13 03:01:15,386 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
      2014-05-13 03:01:15,403 INFO solr.SolrMappingReader - source: content dest: content
      2014-05-13 03:01:15,403 INFO solr.SolrMappingReader - source: title dest: title
      2014-05-13 03:01:15,403 INFO solr.SolrMappingReader - source: host dest: host
      2014-05-13 03:01:15,404 INFO solr.SolrMappingReader - source: batchId dest: batchId
      2014-05-13 03:01:15,404 INFO solr.SolrMappingReader - source: boost dest: boost
      2014-05-13 03:01:15,404 INFO solr.SolrMappingReader - source: digest dest: digest
      2014-05-13 03:01:15,404 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
      2014-05-13 03:01:15,405 INFO indexer.IndexingFilters - Adding org.apache.nutch.analysis.lang.LanguageIndexingFilter
      2014-05-13 03:01:15,405 INFO basic.BasicIndexingFilter - Maximum title length for indexing set to: 100
      2014-05-13 03:01:15,405 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
      2014-05-13 03:01:15,405 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.more.MoreIndexingFilter
      2014-05-13 03:01:15,405 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
      2014-05-13 03:01:15,405 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
      2014-05-13 03:01:15,426 WARN zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
      2014-05-13 03:01:15,442 WARN mapred.FileOutputCommitter - Output path is null in cleanup
      2014-05-13 03:01:16,144 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
      2014-05-13 03:01:16,144 INFO indexer.IndexingJob - Active IndexWriters :
      SOLRIndexWriter
      solr.server.url : URL of the SOLR instance (mandatory)
      solr.commit.size : buffer size when sending to SOLR (default 1000)
      solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
      solr.auth : use authentication (default false)
      solr.auth.username : use authentication (default false)
      solr.auth : username for authentication
      solr.auth.password : password for authentication

      2014-05-13 03:01:16,145 INFO solr.SolrMappingReader - source: content dest: content
      2014-05-13 03:01:16,145 INFO solr.SolrMappingReader - source: title dest: title
      2014-05-13 03:01:16,145 INFO solr.SolrMappingReader - source: host dest: host
      2014-05-13 03:01:16,145 INFO solr.SolrMappingReader - source: batchId dest: batchId
      2014-05-13 03:01:16,145 INFO solr.SolrMappingReader - source: boost dest: boost
      2014-05-13 03:01:16,145 INFO solr.SolrMappingReader - source: digest dest: digest
      2014-05-13 03:01:16,145 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
      2014-05-13 03:01:16,338 INFO solr.SolrIndexWriter - Total 0 document is added.
      2014-05-13 03:01:16,338 INFO indexer.IndexingJob - IndexingJob: done.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Bl4ck1c3 Ralf
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: