Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
0.9.0-incubating
-
None
Description
In future three Enhancement Engine will annotate Topics extracted form analyzed ContentItems
- Topic Engine
- Zemanta Engine
- CELI Classification Engine (See
STANBOL-583)
While all do annotate Topics very similar there are some small variations that need to be aligned to make it easier for users to consume those annotations.
Topic Annotation are a special type of Annotation that is very similar to a fise:EntityAnnotation. The following listing shows expected triples
(1) ?ta rdf:type fise:TopicAnnotation
(2) ?ta fise:entity-reference ?topic-uri
(3) ?ta fise:entity-label ?topic-label
(4) ?ta fise:entity-type ?topic-type
(5) ?ta dc:relation ?ta
(6) ?ta rdf:type fise:TextAnnotation
(7) ?ta fise:start ?sectionStartPos
(8) ?ta fise:end ?sectionEndPos
(9) ?ta dc:type skos:Concept
(1,3,5,6,9) are required
(2) defines the URI of the assigned Topic. This might not be available in case the Topic has only a label but is not formally assigned an unique ID
(4) the type of the Topic. It is strongly suggested to use skos:Concept as type.
(6,7,8) do link the fise:TopicAnnotation with the text. (7,8) are required if a topic needs to be assigned to an sub-section of the analyzed content.
NOTE: fise:selected-text and fise:selection-context are not used in this example as those text could be very huge for bigger sections. Here we would need to define a better way to define the context for TextAnnotations that select whole sections of the parsed content.
As far as I know the TopicEngine already follows this approach. The ZemantaEngine and the CELI Classification Engine need to be adapted (as part of this Issue) to conform to the defined structure.
Attachments
Issue Links
- is related to
-
STANBOL-197 Enhancement Engine for Wikipedia/DBpedia-based topic classification of text content
- Closed
-
STANBOL-583 CELI enhancement engine(s) - Contribution to stanbol
- Closed