Affects Version/s: 1.1, 1.2
Fix Version/s: None
Apache Maven 2.2.1 (rdebian-6)
Java version: 1.6.0_26
Java home: /usr/lib/jvm/java-6-sun-18.104.22.168/jre
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux" version: "3.0.0-17-generic-pae" arch: "i386" Family: "unix"
Both vorbis-java-tika-0.1.jar and tika-parsers-1.1-SNAPSHOT.jar include different copies of META-INF/services/org.apache.tika.parser.Parser, which the auto-detecting parser needs to configure itself.
AFAIK, only one of these can be included in a standalone OSGi JAR, as they both have the same filename.
On my system at least, the vorbis one gets included in the JAR, and not the tika-parsers one.
This means that the Tika server is capable of auto-detecting Vorbis files, but not Microsoft Office files, which is completely broken from my POV.
Unless the (undocumented) Bnd contains some way to merge these files, I suggest either:
- excluding the one from vorbis-java-tika (easy but removes Vorbis auto-detection);
- bundling vorbis-java-tika as an embedded JAR instead of inlined (might work?);
- including a manually combined copy of both manifests in tika-server/src/main/resources (ugly, requires maintenance);
- finding or writing a maven plugin to merge these files (outside my maven-fu).
My simple workaround, which probably removes Vorbis support completely, is this patch: