Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 6.2, 7.0
    • modules/analysis
    • None
    • New

    Description

      I'm planning to implement LSH. Which support query like this

      Find similar documents that have 0.8 or higher similar score with a given document. Similarity measurement can be cosine, jaccard, euclid..

      For example. Given following corpus

      1. Solr is an open source search engine based on Lucene
      2. Solr is an open source enterprise search engine based on Lucene
      3. Solr is an popular open source enterprise search engine based on Lucene
      4. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java

      We wanna find documents that have 0.6 score in jaccard measurement with this doc

      Solr is an open source search engine

      It will return only docs 1,2 and 3 (MoreLikeThis will also return doc 4)

      Attachments

        1. LUCENE-6968.patch
          12 kB
          Cao Manh Dat
        2. LUCENE-6968.patch
          22 kB
          Cao Manh Dat
        3. LUCENE-6968.patch
          37 kB
          Andy Hind
        4. LUCENE-6968.4.patch
          10 kB
          Tommaso Teofili
        5. LUCENE-6968.5.patch
          31 kB
          Andy Hind
        6. LUCENE-6968.6.patch
          1.0 kB
          Tommaso Teofili

        Issue Links

          Activity

            People

              teofili Tommaso Teofili
              caomanhdat Cao Manh Dat
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: