Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1702

Tika config xml support for detectors

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.9
    • Fix Version/s: 1.10
    • Component/s: config
    • Labels:
      None

      Description

      Currently, you can use the Tika Config XML to have very fine-grained control over what parsers to use, in what order, for what mimetypes etc.

      While the same decoration needs won't apply/be appropriate for detectors, the ordering part / composites part / excluding part does. We should therefore add similar support for detectors for those areas

        Activity

        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in tika-trunk-jdk1.7 #809 (See https://builds.apache.org/job/tika-trunk-jdk1.7/809/)
        Convert Translator config to the new pattern for TIKA-1702, and add unit tests for Translator xml config (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693747)

        • /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java
        • /tika/trunk/tika-core/src/main/java/org/apache/tika/language/translate/DefaultTranslator.java
        • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaDetectorConfigTest.java
        • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaParserConfigTest.java
        • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaTranslatorConfigTest.java
        • /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/TIKA-1702-translator-default.xml
        • /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/TIKA-1702-translator-empty-default.xml
        • /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/TIKA-1702-translator-empty.xml
          Allow Detectors to be defined as excluded in Tika Config XML TIKA-1702 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693739)
        • /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java
        • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaDetectorConfigTest.java
          TIKA-1702 Move the parser and detector creation logic to the config loader classes (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693733)
        • /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java
          TIKA-1702 CompositeDetector support for excludes, along the lines of the CompositeParser support (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693730)
        • /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/CompositeDetector.java
        • /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/DefaultDetector.java
        • /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java
          TIKA-1702 Start moving to a loader class pattern for common Detector and Parser (+later others) (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693721)
        • /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java
        • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaDetectorConfigTest.java
          More TIKA-1702 refactoring to bring detectors in line with parsers (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693717)
        • /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java
          TIKA-1702 Refactor some of the config parser loading to be more re-usable for detectors, and bring the method signature in line WRT Composite vs not (must always be composite) (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693713)
        • /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java
        • /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/ProbabilisticMimeDetectionTestWithTika.java
          Start on detector config tests for TIKA-1702 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693710)
        • /tika/trunk/tika-core/src/test/java/org/apache/tika/config/TikaConfigTest.java
        • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/AbstractTikaConfigTest.java
        • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaDetectorConfigTest.java
        • /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaParserConfigTest.java
        • /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/TIKA-1702-detector-blacklist.xml
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in tika-trunk-jdk1.7 #809 (See https://builds.apache.org/job/tika-trunk-jdk1.7/809/ ) Convert Translator config to the new pattern for TIKA-1702 , and add unit tests for Translator xml config (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693747 ) /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java /tika/trunk/tika-core/src/main/java/org/apache/tika/language/translate/DefaultTranslator.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaDetectorConfigTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaParserConfigTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaTranslatorConfigTest.java /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/ TIKA-1702 -translator-default.xml /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/ TIKA-1702 -translator-empty-default.xml /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/ TIKA-1702 -translator-empty.xml Allow Detectors to be defined as excluded in Tika Config XML TIKA-1702 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693739 ) /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaDetectorConfigTest.java TIKA-1702 Move the parser and detector creation logic to the config loader classes (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693733 ) /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java TIKA-1702 CompositeDetector support for excludes, along the lines of the CompositeParser support (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693730 ) /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/CompositeDetector.java /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/DefaultDetector.java /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java TIKA-1702 Start moving to a loader class pattern for common Detector and Parser (+later others) (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693721 ) /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaDetectorConfigTest.java More TIKA-1702 refactoring to bring detectors in line with parsers (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693717 ) /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java TIKA-1702 Refactor some of the config parser loading to be more re-usable for detectors, and bring the method signature in line WRT Composite vs not (must always be composite) (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693713 ) /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/ProbabilisticMimeDetectionTestWithTika.java Start on detector config tests for TIKA-1702 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1693710 ) /tika/trunk/tika-core/src/test/java/org/apache/tika/config/TikaConfigTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/AbstractTikaConfigTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaDetectorConfigTest.java /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaParserConfigTest.java /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/ TIKA-1702 -detector-blacklist.xml
        Hide
        gagravarr Nick Burch added a comment -

        After a fair bit of refactoring of the logic in Tika Config, common code is now used for Parsers, Detectors and Translators. For Detectors, that means you can now create a CompositeDetector from config with custom ordering, and exclude Detectors from DefaultDetector. A config examples of this is available in the unit tests. That's all there as of r1693747.

        As we now have common logic for creating Parsers, Detectors and Translators from the Tika Config XML, the process of expanding this support with options/parameters from the ongoing "Configuring parsers and translators" discussion should be much easier now

        Show
        gagravarr Nick Burch added a comment - After a fair bit of refactoring of the logic in Tika Config, common code is now used for Parsers, Detectors and Translators. For Detectors, that means you can now create a CompositeDetector from config with custom ordering, and exclude Detectors from DefaultDetector. A config examples of this is available in the unit tests. That's all there as of r1693747. As we now have common logic for creating Parsers, Detectors and Translators from the Tika Config XML, the process of expanding this support with options/parameters from the ongoing "Configuring parsers and translators" discussion should be much easier now

          People

          • Assignee:
            Unassigned
            Reporter:
            gagravarr Nick Burch
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development