Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-471

DictionaryNameFinder has HASHing issues

    XMLWordPrintableJSON

Details

    Description

      The DictionaryNameFinder has issues finding multi-token names when the dictionary is searched a token at a time by the find() method. If, the dictionary doesn't have a single (or shorter) token match available in the dictionary.

      Having a dictionary with

      {"folic", "acid"}

      without an entry for

      {"folic"}

      will cause the find() method to totally skip the fact there is a longer match possible.

      Thanks to Jim for pushing this and to my debugging skills to find.

      Two possiblilites come to mind:
      1) I don't really like, is we turn it into a larger problem by trying longer matches when shorter ones don't match. Unfortunately, this turns quickly into a race to see who can wait longer.

      2) A way of returning a possible match that may need exploring, or a look-ahead type system to say we don't match "folic" but if you have "acid" after "folic" we have a match for that in the dictionary.

      3) Leave it as is and modify the dictionary to add shorter terms to the dictionary... maybe marking as not-a-valid entry so we can know we need a longer match.

      Attachments

        Activity

          People

            jkosin James Kosin
            jkosin James Kosin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: