Tika
  1. Tika
  2. TIKA-491

Add language identification support for Norwegian Bokmål and Norwegian Nynorsk

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.7
    • Fix Version/s: None
    • Component/s: languageidentifier
    • Labels:
      None

      Description

      Currently there is one Norwegian language profile in Tika - "no". We need to distinguish between the two official Norwegian languages defined by ISO 639-1 codes "nb" and "nn". Those codes are recommended used instead of the common "no" tag.

      Proposed solved by removing the current language profile no.ngp and replacing it with two new ones for nb and nn.

      We must also add tests for Norwegian

        Activity

        Hide
        Pander Musubi added a comment -

        Please see also https://issues.apache.org/jira/browse/TIKA-369 proposing to use https://code.google.com/p/language-detection/ for improved language detection.

        Show
        Pander Musubi added a comment - Please see also https://issues.apache.org/jira/browse/TIKA-369 proposing to use https://code.google.com/p/language-detection/ for improved language detection.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jan Høydahl
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development