Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.2, 7.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I'm planning to implement LSH. Which support query like this

      Find similar documents that have 0.8 or higher similar score with a given document. Similarity measurement can be cosine, jaccard, euclid..

      For example. Given following corpus

      1. Solr is an open source search engine based on Lucene
      2. Solr is an open source enterprise search engine based on Lucene
      3. Solr is an popular open source enterprise search engine based on Lucene
      4. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java

      We wanna find documents that have 0.6 score in jaccard measurement with this doc

      Solr is an open source search engine

      It will return only docs 1,2 and 3 (MoreLikeThis will also return doc 4)

        Attachments

        1. LUCENE-6968.patch
          12 kB
          Cao Manh Dat
        2. LUCENE-6968.patch
          22 kB
          Cao Manh Dat
        3. LUCENE-6968.patch
          37 kB
          Andy Hind
        4. LUCENE-6968.4.patch
          10 kB
          Tommaso Teofili
        5. LUCENE-6968.5.patch
          31 kB
          Andy Hind
        6. LUCENE-6968.6.patch
          1.0 kB
          Tommaso Teofili

          Issue Links

            Activity

              People

              • Assignee:
                teofili Tommaso Teofili
                Reporter:
                caomanhdat Cao Manh Dat
              • Votes:
                0 Vote for this issue
                Watchers:
                16 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: