Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
Currently all EnhancementEngines that create TextAnnotations use TypedLiterals of the type xsd:string for values of the fise:selected-text and fise:context properties. However both values are in fact natural language text therefore it would be better to use PlainLiterals and also add the langage as detected for the parsed content.
Example:
parsed Content: "The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley."
Detected lanauge: "en"
Text Annotations: "Paris" and "Bob Marley"
currently the selection context and the selected-text would be represented like:
<fise:selection-context rdf:datatype="http://www.w3.org/2001/XMLSchema#string">The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley.</j.7:selection-context>
<fise:selected-text rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Bob Marley</j.7:selected-text>
after this issue is resolved the same information would be represented like
<fise:selection-context xml:lang="en">The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley.</j.7:selection-context>
<fise:selected-text xml:lang="en">Bob Marley</j.7:selected-text>
Advantages:
- The suggested representation is more in line with the semantic meaning
- Engines that consume text selections could use the language as provided by current TextAnnotation. This would allow to correctly search for entities in documents containing parts in multiple languages.
- Still such engines could use the language annotation for the document as fallback if no language is provided by TextAnnotations (backward compatibility)