Another alternative for cross-platform use is the CLI feature:
- Extracting structured text content from a file
java -jar tika-0.2-standalone.jar --xml /path/to/file
- Extracting plain text content from a file
java -jar tika-0.2-standalone.jar --text /path/to/file
- Extracting metadata from a file
java -jar tika-0.2-standalone.jar --metadata /path/to/file
This way you don't need a separate server process and there won't be any concerns about unauthorized users getting access to your files.
I'm a bit concerned about any web service that allows the client to retrieve the contents of any file on the local file system. Would it make more sense to always require the client to upload the files they want parsed?
Also, the file system traversal feature seems a bit outside the scope of Tika, though having something like this in a contrib area might be nice.