Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-197

Enhancement Engine for Wikipedia/DBpedia-based topic classification of text content

    XMLWordPrintableJSON

Details

    Description

      Implementation plan:

      Use MoreLikeThis queries on a SolrYard instance with topics indexed by aggregating the text of abstracts of all entities marked categorized by a given SKOS topic from DBpedia.

      Such an index can be constructed using the pig scripts available at:
      https://github.com/ogrisel/pignlproc/tree/master/examples/topic-corpus
      or
      https://github.com/ogrisel/dbpediakit

      In order to perform MoreLikeThis queries using the SolrJ API it is possible to do the following:

      #1 - Define the mlt handles in solrconfig.xml (it's not defined in the example
      solrconfig.xml I was using):

      <requestHandler name="/mlt" class="solr.MoreLikeThisHandler" />

      #2 - with Solrj, access the mlt handler via something similar to the following:

      query.setQueryType("/" + MoreLikeThisParams.MLT);
      query.set(MoreLikeThisParams.MATCH_INCLUDE, false);
      query.set(MoreLikeThisParams.MIN_DOC_FREQ, 1);
      query.set(MoreLikeThisParams.MIN_TERM_FREQ, 1);
      query.set(MoreLikeThisParams.SIMILARITY_FIELDS, "subject,body");
      query.setQuery("Your query here or in my case the unique key field:value");

      Attachments

        Issue Links

          Activity

            People

              rafaharo Rafa Haro
              ogrisel Olivier Grisel
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: