Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3714

cannot retrieve file correctly which contains non ascii char in path

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.4.0
    • server
    • None

    Description

      Produce:

      call a rest to detect the file media type, the file exists in the file system.

      curl --verbose -X PUT http://localhost:9998/detect/stream -H "fetcherName: minio-data" -H "fetchKey: 中文.docx" 

      but the header fetchKey cannot be processed correctly, it will lead to FileNotFound exception, as the fetchKey cannot be correctly submitted to server.

      According to RFC of the HTTP/1.1 it is not possible sending non US-ASCII symbols in the HTTP headers, but the current mechanism in tika pipe(https://cwiki.apache.org/confluence/display/TIKA/tika-pipes#FileSystemEmitter) is trying to use http header to carry the file path information, it is very common that the file path contians none ascii chars.

       

      Suggest to support http parameters for fetcherName and fetchKey. The http parameters can handle none ascii chars correctly.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              beam beamliu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: