Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-246

Exact name match should get boosted in the entity hub SolrYard indices

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0-incubating
    • Entityhub
    • None

    Description

      For instance, using the default embedded solryard index:

       curl -X POST -d "name=United States&limit=10&offset=0" http://localhost:8080/entityhub/site/dbpedia/find
      

      The first results are "United States Navy" and "United States Air Force" and finally "United States" comes in the third position. See the attached JSON output.

      Exact name match (or close to exact matches) should get a score boost. This can probably be implemented with FuzzyQuery and minSimilarity of 0.8f for instance.

      https://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/search/FuzzyQuery.html

      Maybe in this case the popularity boost are bad because of the naive incoming links. Using a Page Rank style centrality score might work better in this case:

      https://github.com/julienledem/Pig-scripting-examples/tree/master/Page%20Rank
      https://github.com/mesos/spark/blob/master/bagel/src/main/scala/spark/bagel/examples/WikipediaPageRank.scala

      Attachments

        Activity

          People

            rwesten Rupert Westenthaler
            ogrisel Olivier Grisel
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: