Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1708

Detectors loaded from configuration files into CompositeDetector fail

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.10
    • Fix Version/s: 1.11
    • Component/s: config, detector
    • Labels:
      None

      Description

      Loading individual Detectors from a configuration file, e.g.

      <detector class="org.apache.tika.parser.microsoft.POIFSContainerDetector"/>
      <detector class="org.apache.tika.mime.MimeTypes"/>

      will cause them to be added to a CompositeDetector which does not detect, e.g. PST files.

      1. TIKA-1708.zip
        3 kB
        Justin Palmer

        Activity

        Hide
        crynax Justin Palmer added a comment -

        Test case.

        Show
        crynax Justin Palmer added a comment - Test case.
        Hide
        crynax Justin Palmer added a comment -

        Attached test case fails...
        Running org.apache.tika.detect.ConfiguredDetectorTest
        Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.434 sec <<< FAILURE! - in org.apache.tika.detect.ConfiguredDetectorTest
        testLoadedCompositeConfig(org.apache.tika.detect.ConfiguredDetectorTest) Time elapsed: 0.029 sec <<< FAILURE!
        java.lang.AssertionError: null
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertTrue(Assert.java:52)
        at org.apache.tika.detect.ConfiguredDetectorTest.parseDocument(ConfiguredDetectorTest.java:55)
        at org.apache.tika.detect.ConfiguredDetectorTest.testLoadedCompositeConfig(ConfiguredDetectorTest.java:77)

        Show
        crynax Justin Palmer added a comment - Attached test case fails... Running org.apache.tika.detect.ConfiguredDetectorTest Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.434 sec <<< FAILURE! - in org.apache.tika.detect.ConfiguredDetectorTest testLoadedCompositeConfig(org.apache.tika.detect.ConfiguredDetectorTest) Time elapsed: 0.029 sec <<< FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.tika.detect.ConfiguredDetectorTest.parseDocument(ConfiguredDetectorTest.java:55) at org.apache.tika.detect.ConfiguredDetectorTest.testLoadedCompositeConfig(ConfiguredDetectorTest.java:77)
        Hide
        gagravarr Nick Burch added a comment -

        Thanks for the unit test and config, very helpful! Slightly tweaked version added in r1696159.

        For some reason, when going down the second route, the mime types detector has no types. Now to investigate why!

        Show
        gagravarr Nick Burch added a comment - Thanks for the unit test and config, very helpful! Slightly tweaked version added in r1696159. For some reason, when going down the second route, the mime types detector has no types. Now to investigate why!
        Hide
        gagravarr Nick Burch added a comment -

        I think this is solved in r1696160. The problem was that, in the case of explicit detector names, we were creating a second new empty mime types and using that in the detector. I've changed it to spot this case, and instead use the mime types the config has already created. Any chance you could test and confirm?

        Show
        gagravarr Nick Burch added a comment - I think this is solved in r1696160. The problem was that, in the case of explicit detector names, we were creating a second new empty mime types and using that in the detector. I've changed it to spot this case, and instead use the mime types the config has already created. Any chance you could test and confirm?
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in tika-trunk-jdk1.7 #828 (See https://builds.apache.org/job/tika-trunk-jdk1.7/828/)
        TIKA-1708 If the Tika Config detector entry calls for MimeTypes, use the already created one, avoid creating a new empty one (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696160)

        • /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java
        • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaDetectorConfigTest.java
          Outlook detection with custom config tests, based on work by Justin Palmer TIKA-1708 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696159)
        • /tika/trunk/tika-core/src/test/java/org/apache/tika/config/AbstractTikaConfigTest.java
        • /tika/trunk/tika-example/pom.xml
        • /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mbox/OutlookPSTParser.java
        • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaDetectorConfigTest.java
        • /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/TIKA-1708-detector-composite.xml
        • /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/TIKA-1708-detector-default.xml
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in tika-trunk-jdk1.7 #828 (See https://builds.apache.org/job/tika-trunk-jdk1.7/828/ ) TIKA-1708 If the Tika Config detector entry calls for MimeTypes, use the already created one, avoid creating a new empty one (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696160 ) /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaDetectorConfigTest.java Outlook detection with custom config tests, based on work by Justin Palmer TIKA-1708 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696159 ) /tika/trunk/tika-core/src/test/java/org/apache/tika/config/AbstractTikaConfigTest.java /tika/trunk/tika-example/pom.xml /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mbox/OutlookPSTParser.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaDetectorConfigTest.java /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/ TIKA-1708 -detector-composite.xml /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/ TIKA-1708 -detector-default.xml
        Hide
        crynax Justin Palmer added a comment -

        Tested and confirmed. Thanks Nick!

        Show
        crynax Justin Palmer added a comment - Tested and confirmed. Thanks Nick!

          People

          • Assignee:
            Unassigned
            Reporter:
            crynax Justin Palmer
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development