Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3834

Tika-Server can not get the text of a document encoding in GB18030.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Trivial
    • Resolution: Resolved
    • 2.3.0
    • 2.3.0, 2.4.1
    • tika-server
    • Linux

    Description

      There are 2 files :

      111.csv (Content-Encoding: UTF-8)

      112.csv (Content-Encoding: GB18030)

       

      Tika-app can get the text of the two files.

      java -jar tika-app-1.24.1.jar -t 111.csv

      java -jar tika-app-1.24.1.jar -t 112.csv

       

      Tika-server can get the text of 111.csv.

      curl -T 111.csv http://127.0.0.1:12000/tika --head "Accept: text/plain"

       

      But Tika-server can not get the text of 112.csv.

      curl -T 112.csv http://127.0.0.1:12000/tika --head "Accept: text/plain"

       

      Attachments

        1. 111.csv
          0.3 kB
          Di Dongke
        2. 112.csv
          0.2 kB
          Di Dongke

        Activity

          People

            Unassigned Unassigned
            Di Dongke Di Dongke
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: