Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-867

UTF-8 encoding does not work on windows

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.0
    • None
    • cli
    • None
    • Windows 7 Enterprise (Java 1.6.0_31) and MAC OS X 10.7.3 (Java 1.6.0_30)

    Description

      When calling tika as command line tool from within java and parsing the output buffer with UTF-8 (e.g. new String(buffer, 0, len, Charset.forName("UTF-8")) behaviour on windows is different than on mac os.
      On windows the encoding seems to be wrong (Währung vs. W?hrung). Other tools like exiftool work as expected.

      Attachments

        1. TIKA-867.patch
          0.9 kB
          John Mastarone

        Activity

          People

            Unassigned Unassigned
            tika_wau Wolfgang Außerlechner
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: