Tika
  1. Tika
  2. TIKA-893

Tika-server bundle includes wrong META-INF/services/org.apache.tika.parser.Parser, doesn't work

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1, 1.2
    • Fix Version/s: None
    • Component/s: packaging
    • Labels:
    • Environment:

      Apache Maven 2.2.1 (rdebian-6)
      Java version: 1.6.0_26
      Java home: /usr/lib/jvm/java-6-sun-1.6.0.26/jre
      Default locale: en_GB, platform encoding: UTF-8
      OS name: "linux" version: "3.0.0-17-generic-pae" arch: "i386" Family: "unix"

      Description

      Both vorbis-java-tika-0.1.jar and tika-parsers-1.1-SNAPSHOT.jar include different copies of META-INF/services/org.apache.tika.parser.Parser, which the auto-detecting parser needs to configure itself.

      AFAIK, only one of these can be included in a standalone OSGi JAR, as they both have the same filename.
      On my system at least, the vorbis one gets included in the JAR, and not the tika-parsers one.

      This means that the Tika server is capable of auto-detecting Vorbis files, but not Microsoft Office files, which is completely broken from my POV.

      Unless the (undocumented) Bnd contains some way to merge these files, I suggest either:

      • excluding the one from vorbis-java-tika (easy but removes Vorbis auto-detection);
      • bundling vorbis-java-tika as an embedded JAR instead of inlined (might work?);
      • including a manually combined copy of both manifests in tika-server/src/main/resources (ugly, requires maintenance);
      • finding or writing a maven plugin to merge these files (outside my maven-fu).

      My simple workaround, which probably removes Vorbis support completely, is this patch:

      tika-server/pom.xml.patch
      
      @@ -163,7 +168,7 @@
                 <instructions>
                   <Export-Package>org.apache.tika.*</Export-Package>
                   <Embed-Dependency>
      -                !jersey-server;scope=compile;inline=META-INF/services/**|au/**|javax/**|org/**|com/**|Resources/**|font_metrics.properties|repackage/**|schema*/**,
      +                !jersey-server;artifactId=!vorbis-java-tika;scope=compile;inline=META-INF/services/**|au/**|javax/**|org/**|com/**|Resources/**|font_metrics.properties|repackage/**|schema*/**,
                       jersey-server;scope=compile;inline=com/** |META-INF/services/com.sun*|META-INF/services/javax.ws.rs.ext.RuntimeDelegate
                   </Embed-Dependency>
      

        Activity

        Chris Wilson created issue -
        Hide
        Nick Burch added a comment -

        See my comment on TIKA-747 too, it affects tika-app also (but differently, as the tika-parsers file wins there).

        We still need someone who knows how the embedding stuff works to take a look at this, and change it so that multiple versions services files are merged together into one, rather than just one jar's version winning over the others.

        Show
        Nick Burch added a comment - See my comment on TIKA-747 too, it affects tika-app also (but differently, as the tika-parsers file wins there). We still need someone who knows how the embedding stuff works to take a look at this, and change it so that multiple versions services files are merged together into one, rather than just one jar's version winning over the others.

          People

          • Assignee:
            Unassigned
            Reporter:
            Chris Wilson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development