Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-322

Improve encoding detection speed and accuracy

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2
    • Component/s: mime
    • Labels:
      None

      Description

      The encoding detection code we took from ICU4J is not very efficient and sometimes produces odd results when more than one encoding matches the given input data. It would be good to refactor the code to be faster for easy-to-detect encodings and to have better heuristics in case multiple matches are found.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jukkaz Jukka Zitting
                Reporter:
                jukkaz Jukka Zitting
              • Votes:
                1 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: