[STANBOL-617] Define how TopicEnhancements are written to the Enhancement Structure - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.9.0-incubating
Fix Version/s: enhancement-engines-0.10.0
Component/s: Enhancement Engines, Enhancer
Labels:
None

Description

In future three Enhancement Engine will annotate Topics extracted form analyzed ContentItems

Topic Engine
Zemanta Engine
CELI Classification Engine (See ~~STANBOL-583~~)

While all do annotate Topics very similar there are some small variations that need to be aligned to make it easier for users to consume those annotations.

Topic Annotation are a special type of Annotation that is very similar to a fise:EntityAnnotation. The following listing shows expected triples

(1) ?ta rdf:type fise:TopicAnnotation
(2) ?ta fise:entity-reference ?topic-uri
(3) ?ta fise:entity-label ?topic-label
(4) ?ta fise:entity-type ?topic-type
(5) ?ta dc:relation ?ta

(6) ?ta rdf:type fise:TextAnnotation
(7) ?ta fise:start ?sectionStartPos
(8) ?ta fise:end ?sectionEndPos
(9) ?ta dc:type skos:Concept

(1,3,5,6,9) are required
(2) defines the URI of the assigned Topic. This might not be available in case the Topic has only a label but is not formally assigned an unique ID
(4) the type of the Topic. It is strongly suggested to use skos:Concept as type.

(6,7,8) do link the fise:TopicAnnotation with the text. (7,8) are required if a topic needs to be assigned to an sub-section of the analyzed content.
NOTE: fise:selected-text and fise:selection-context are not used in this example as those text could be very huge for bigger sections. Here we would need to define a better way to define the context for TextAnnotations that select whole sections of the parsed content.

As far as I know the TopicEngine already follows this approach. The ZemantaEngine and the CELI Classification Engine need to be adapted (as part of this Issue) to conform to the defined structure.

Attachments

Issue Links

is related to

STANBOL-197 Enhancement Engine for Wikipedia/DBpedia-based topic classification of text content

Closed

STANBOL-583 CELI enhancement engine(s) - Contribution to stanbol

Closed

Activity

People

Assignee:: Rupert Westenthaler

Reporter:: Rupert Westenthaler

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 17/May/12 08:15

Updated:: 12/Apr/13 08:38

Resolved:: 14/Dec/12 05:11