Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.9.0-incubating
    • Fix Version/s: enhancer-0.10.0
    • Component/s: Enhancer
    • Labels:
      None

      Description

      With the addition of the CELI Langauge Identification Engine there are now two different engines that do support the same feature.

      However currently Engines that do consume the detected language are "hard coded" to the LangId Engine (enhancer/engines/langid). Something that need to be changed to allow the adoption of alternatives - like the CELI based implementation.

      The suggestion is to use the following Pattern to extract the language

      (1) via Annotations:

      ?x rdf:type fise:TextAnnotation .
      ?x dc:language ?language .
      OPTIONAL

      { ?x dc:created ?engine }

      OPTIONAL

      { ?x fise:confidence ?confidence }

      (2) via ContentItem metadata

      ?ci dc:language ?language

      (2) is a fallback if (1) delivers no results.

      Methods that

      • extract the language (with the highest confidence) - including fallback to (2)
      • extract all languages (sorted by confidence) - including fallback to (2)
      • extract all TextAnnotations with dc:language values

      are added to the EnhancementEngineHelper utility of the enhancer.servicesapi module

        Issue Links

          Activity

          Hide
          rwesten Rupert Westenthaler added a comment -

          fise:TextAnnotations descibing the language of an analyzed text should also use "http://purl.org/dc/terms/LinguisticSystem" as dc:type

          ?la rdf:type fise:TextAnnotation, fise:Enhancement
          ?la dc:type dc:LinguisticSystem
          ?la dc:language ?lang

          and all properties required by fise:Enhancement

          Show
          rwesten Rupert Westenthaler added a comment - fise:TextAnnotations descibing the language of an analyzed text should also use "http://purl.org/dc/terms/LinguisticSystem" as dc:type ?la rdf:type fise:TextAnnotation, fise:Enhancement ?la dc:type dc:LinguisticSystem ?la dc:language ?lang and all properties required by fise:Enhancement
          Hide
          rwesten Rupert Westenthaler added a comment -

          fixed with #1339560 in the trunk and merged back to the CELI engine branch

          Show
          rwesten Rupert Westenthaler added a comment - fixed with #1339560 in the trunk and merged back to the CELI engine branch

            People

            • Assignee:
              rwesten Rupert Westenthaler
              Reporter:
              rwesten Rupert Westenthaler
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development