Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-850

SolrDeleteDuplicates needs to clone the SolrRecord objects

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1
    • Fix Version/s: 1.2, nutchgora
    • Component/s: indexer
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      The reduce() method of SolrDeleteDuplicates deduplicates SOLRRecords given their signature. The first SOLRRecord is stored in a variable recordToKeep and is compared to the following SOLRRecords found with the same signature. The only trouble being that the first instance is reused by Hadoop when calling values.next() and hence recordToKeep gets the same values as the latest call to values.next().

      The patch attached clones the SOLRRecord before assigning them to recordToKeep in order to avoid the problem.

        Attachments

        1. NUTCH-850.patch
          1 kB
          Julien Nioche

          Activity

            People

            • Assignee:
              jnioche Julien Nioche
              Reporter:
              jnioche Julien Nioche
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: