Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-799

Add support for hash based exact/near duplicate document handling

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Component/s: update
    • Labels:
      None

      Description

      Hash based duplicate document detection is efficient and allows for blocking as well as field collapsing. Lets put it into solr.

      http://wiki.apache.org/solr/Deduplication

        Attachments

        1. SOLR-799.patch
          23 kB
          Mark Miller
        2. SOLR-799.patch
          36 kB
          Mark Miller
        3. SOLR-799.patch
          35 kB
          Mark Miller
        4. SOLR-799.patch
          33 kB
          Mark Miller

          Activity

            People

            • Assignee:
              yseeley@gmail.com Yonik Seeley
              Reporter:
              markrmiller@gmail.com Mark Miller
            • Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: