Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-850

SolrDeleteDuplicates needs to clone the SolrRecord objects

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1
    • 1.2, nutchgora
    • indexer
    • None
    • Patch Available

    Description

      The reduce() method of SolrDeleteDuplicates deduplicates SOLRRecords given their signature. The first SOLRRecord is stored in a variable recordToKeep and is compared to the following SOLRRecords found with the same signature. The only trouble being that the first instance is reused by Hadoop when calling values.next() and hence recordToKeep gets the same values as the latest call to values.next().

      The patch attached clones the SOLRRecord before assigning them to recordToKeep in order to avoid the problem.

      Attachments

        1. NUTCH-850.patch
          1 kB
          Julien Nioche

        Activity

          People

            jnioche Julien Nioche
            jnioche Julien Nioche
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: