[STANBOL-1091] EntityLinking Engine should not process the same tokens twice - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: enhancement-engines-0.10.0
Component/s: Enhancement Engines
Labels:
None

Description

The EntityLinking Engine currently processes the text based on Sections (typically Sentences - if present). However in cases where multiple NLP framework do process the parsed text it might happen that Sentence annotations are overlapping. In such cases the EntityLinkingEngine would first process the Sentence with the earlier start and/or later end position. But it would also process the other sentence that is (partially) covered by the other one. Because of that Tokens and Chunks contained in two (or more) overlapping Sentence annotations will be processed twice.

To avoid this the EntityLinking Engine should keep track of Tokens that where already processed and just ignore already processed parts of overlapping sentences.

NOTE: This will not have any affects on the Entity Linking Results. However it will prevent unnecessary processing steps in cases as described above.

Attachments

Issue Links

breaks

STANBOL-1070 Entity Co-Mention Engine

Resolved

Activity

People

Assignee:: Rupert Westenthaler

Reporter:: Rupert Westenthaler

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 05/Jun/13 08:14

Updated:: 23/Jul/13 14:04

Resolved:: 05/Jun/13 08:19