Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3620

Language detection documentation needs attention

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • languageidentifier
    • None

    Description

      This language identifier/detection suffers from a few problems

      1. Clarity is needed on identifier/identification Vs detector/detection. Which is it? The source code says identifier whereas the documentation is nested under detection.
      2. The org.apache.tika.language.LanguageIdentifier returns 404. What is this meant to resolve to?
      3. Generally speaking the documentation is literally non-existent. I checked the wiki and failed to find anything. I did find some minor documentation but this is also severely lacking. Also note the broken hyperlink.

      Some suggestions for improvement

      1. Fix the broken hyperlinks.
      2. Hyperlink to the existing example namely LanguageDetectorExample.java, LanguageDetectingParser.java and Language.java
      3. Hyperlink to the LanguageDetector Javadoc and atleast mention some of the other implementations.

      Attachments

        Activity

          People

            lewismc Lewis John McGibbney
            lewismc Lewis John McGibbney
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: