Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1100

SolrDedup broken

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.4
    • 1.9
    • indexer
    • None

    Description

      Some Solr indices are unable to be deduped from Nutch. For unknown reasons Nutch will throw the exception below. There are no peculiarities to be found in the Solr logs, the queries are normal and seem to succeed.

      java.lang.NullPointerException
              at org.apache.hadoop.io.Text.encode(Text.java:388)
              at org.apache.hadoop.io.Text.set(Text.java:178)
              at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:272)
              at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:243)
              at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
              at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
              at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
              at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
              at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
      

      Attachments

        1. NUTCH-1100-1.6-1.patch
          0.7 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              Unassigned Unassigned
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: