Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-799

Add support for hash based exact/near duplicate document handling

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.4
    • update
    • None

    Description

      Hash based duplicate document detection is efficient and allows for blocking as well as field collapsing. Lets put it into solr.

      http://wiki.apache.org/solr/Deduplication

      Attachments

        1. SOLR-799.patch
          23 kB
          Mark Miller
        2. SOLR-799.patch
          36 kB
          Mark Miller
        3. SOLR-799.patch
          35 kB
          Mark Miller
        4. SOLR-799.patch
          33 kB
          Mark Miller

        Activity

          People

            yseeley@gmail.com Yonik Seeley
            markrmiller@gmail.com Mark Miller
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: