Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-568

Language Detection isReasonablyCertain() hides valuable information

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • languageidentifier
    • None

    Description

      LanguageIdentifier.isReasonablyCertain() hardcodes a threshold for language detection, which is fine, except applications should be allowed to decide what threshold suits them. For instance, how was 0.022 decided?

      Attachments

        1. TIKA-568.patch
          0.6 kB
          Grant Ingersoll

        Issue Links

          Activity

            People

              kkrugler Kenneth William Krugler
              gsingers Grant Ingersoll
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: