Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-253

Add text similarity / relevance / syntactic match component based on parse trees

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • None
    • Parser
    • None
    • jave

    Description

      Proposed component relies on openNLP parser, and gives search engineers a simple relevance verification tool which relies on machine learning of syntactic parse trees.

      The value for search engineers community is that they dont have to be familiar with NLP to use syntactic generalization component, which does parsing/chunking by openNLP and then graph-based learning for relevance assessment (proposed component).

      One of the expected usage scenario is that a search library like lucene is used, and this component would accept / reject irrelevant search results (according to the proposed syntactic generalization measure).

      This code has been deployed commercially over last 2 years at datran.com and zvents.com and is serving > 20 mln users monthly.

      There is a number of publications on this project, including

      http://portal.acm.org/citation.cfm?id=1881190

      http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS11/paper/view/2573

      Attachments

        1. text_similarity_proposal_for_opennlp.zip
          149 kB
          Boris Galitsky
        2. text_similarity_proposal_for_opennlp.test.zip
          9 kB
          Boris Galitsky

        Activity

          People

            joern Jörn Kottmann
            bgalitsky Boris Galitsky
            Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 672h
                672h
                Remaining:
                Remaining Estimate - 672h
                672h
                Logged:
                Time Spent - Not Specified
                Not Specified