Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-322

Improve encoding detection speed and accuracy

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.2
    • mime
    • None

    Description

      The encoding detection code we took from ICU4J is not very efficient and sometimes produces odd results when more than one encoding matches the given input data. It would be good to refactor the code to be faster for easy-to-detect encodings and to have better heuristics in case multiple matches are found.

      Attachments

        Issue Links

          Activity

            People

              jukkaz Jukka Zitting
              jukkaz Jukka Zitting
              Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: