Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-893

Tika-server bundle includes wrong META-INF/services/org.apache.tika.parser.Parser, doesn't work

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.1, 1.2
    • None
    • packaging
    • Apache Maven 2.2.1 (rdebian-6)
      Java version: 1.6.0_26
      Java home: /usr/lib/jvm/java-6-sun-1.6.0.26/jre
      Default locale: en_GB, platform encoding: UTF-8
      OS name: "linux" version: "3.0.0-17-generic-pae" arch: "i386" Family: "unix"

    Description

      Both vorbis-java-tika-0.1.jar and tika-parsers-1.1-SNAPSHOT.jar include different copies of META-INF/services/org.apache.tika.parser.Parser, which the auto-detecting parser needs to configure itself.

      AFAIK, only one of these can be included in a standalone OSGi JAR, as they both have the same filename.
      On my system at least, the vorbis one gets included in the JAR, and not the tika-parsers one.

      This means that the Tika server is capable of auto-detecting Vorbis files, but not Microsoft Office files, which is completely broken from my POV.

      Unless the (undocumented) Bnd contains some way to merge these files, I suggest either:

      • excluding the one from vorbis-java-tika (easy but removes Vorbis auto-detection);
      • bundling vorbis-java-tika as an embedded JAR instead of inlined (might work?);
      • including a manually combined copy of both manifests in tika-server/src/main/resources (ugly, requires maintenance);
      • finding or writing a maven plugin to merge these files (outside my maven-fu).

      My simple workaround, which probably removes Vorbis support completely, is this patch:

      tika-server/pom.xml.patch
      @@ -163,7 +168,7 @@
                 <instructions>
                   <Export-Package>org.apache.tika.*</Export-Package>
                   <Embed-Dependency>
      -                !jersey-server;scope=compile;inline=META-INF/services/**|au/**|javax/**|org/**|com/**|Resources/**|font_metrics.properties|repackage/**|schema*/**,
      +                !jersey-server;artifactId=!vorbis-java-tika;scope=compile;inline=META-INF/services/**|au/**|javax/**|org/**|com/**|Resources/**|font_metrics.properties|repackage/**|schema*/**,
                       jersey-server;scope=compile;inline=com/** |META-INF/services/com.sun*|META-INF/services/javax.ws.rs.ext.RuntimeDelegate
                   </Embed-Dependency>
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            gcc Graham Charters
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: