Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2105

Unable to process documents with french accents in filenames

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 1.13
    • Fix Version/s: None
    • Component/s: batch
    • Labels:
      None
    • Environment:

      Windows 7, Java version 1.7.0.111

    • Flags:
      Patch, Important

      Description

      When I execute the following batch test1.bat script from my command prompt, I get this error message:

      test1.bat
      @echo off
      "C:\Program Files (x86)\Java\jre7\bin\java" -jar c:\temp\tika-app-1.13.jar -m "S:\2008-09\2009-10\IC IT Environment 2009\fran├žais.docx"

      Error:
      Exception in thread "main" java.net.MalformedURLException: unknown protocol: s
      at java.net.URL.<init>(Unknown Source)
      at java.net.URL.<init>(Unknown Source)
      at java.net.URL.<init>(Unknown Source)
      at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:472)
      at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)

      When the filenames don't have special French characters, it works fine. (I cannot change the names of all the files that need to be processed).

      I apologise, my experience with java and TIKA is very limited.

      Thanks

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              susserj susserj
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: