Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2205

Nutch solrdedup error in solrcloud for larger docs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Auto Closed
    • 2.3
    • 2.5
    • indexer
    • None
    • CentOS 6.5,Jdk 1.7.0_75,omcat 8.0.9 ,Hadoop 2.5.2,Zookeeper 3.4.6 ,Hbase 0.98.8 ,Solr 4.8.1 ,Nutch 2.3.1

    Description

      When the number of solr docs larger than 9000,the solrdedup of the nutch is broken.This is log:

      http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2
      16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: starting...
      16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: Solr url: http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2
      16/01/25 17:02:39 INFO client.RMProxy: Connecting to ResourceManager at master.Itble/10.192.1.100:8032
      16/01/25 17:02:43 INFO mapreduce.JobSubmitter: number of splits:1
      16/01/25 17:02:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1453104806095_0162
      16/01/25 17:02:44 INFO impl.YarnClientImpl: Submitted application application_1453104806095_0162
      16/01/25 17:02:44 INFO mapreduce.Job: The url to track the job: http://master.Itble:8088/proxy/application_1453104806095_0162/
      16/01/25 17:02:44 INFO mapreduce.Job: Running job: job_1453104806095_0162
      16/01/25 17:02:54 INFO mapreduce.Job: Job job_1453104806095_0162 running in uber mode : false
      16/01/25 17:02:54 INFO mapreduce.Job: map 0% reduce 0%
      16/01/25 17:03:02 INFO mapreduce.Job: Task Id : attempt_1453104806095_0162_m_000000_0, Status : FAILED
      Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2, http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1
      at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554)
      at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
      at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
      at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
      at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
      at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291)
      at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
      at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
      at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

      16/01/25 17:03:12 INFO mapreduce.Job: Task Id : attempt_1453104806095_0162_m_000000_1, Status : FAILED
      Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2, http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1, http://10.192.1.102:8080/solr/myEnterpriseCollection_shard1_replica1
      at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554)
      at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
      at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
      at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
      at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
      at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291)
      at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
      at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
      at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

      16/01/25 17:03:22 INFO mapreduce.Job: Task Id : attempt_1453104806095_0162_m_000000_2, Status : FAILED
      Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2, http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1, http://10.192.1.102:8080/solr/myEnterpriseCollection_shard1_replica1
      at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554)
      at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
      at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
      at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
      at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
      at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291)
      at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
      at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
      at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

      16/01/25 17:03:31 INFO mapreduce.Job: map 100% reduce 100%
      16/01/25 17:03:31 INFO mapreduce.Job: Job job_1453104806095_0162 failed with state FAILED due to: Task failed task_1453104806095_0162_m_000000
      Job failed as tasks failed. failedMaps:1 failedReduces:0

      16/01/25 17:03:31 INFO mapreduce.Job: Counters: 8
      Job Counters
      Failed map tasks=4
      Launched map tasks=4
      Other local map tasks=4
      Total time spent by all maps in occupied slots (ms)=30150
      Total time spent by all reduces in occupied slots (ms)=0
      Total time spent by all map tasks (ms)=30150
      Total vcore-seconds taken by all map tasks=30150
      Total megabyte-seconds taken by all map tasks=46310400

      Attachments

        Activity

          People

            Unassigned Unassigned
            VictorHu VictorHu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: