Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2460

Possibility to add custom-mimetypes.xml (and/or also other configuration files) from location outside classpath

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.11
    • Fix Version/s: None
    • Component/s: mime
    • Labels:
      None

      Description

      I would like to be able to pass to tika the custom-mimetypes.xml from outside classpath, because it is more flexible.

      Our application is based on eclipse/osgi and it's composed of multiple plugins/bundles.
      One of these plugins contains also the tika library (almost all settings are default)
      We usually provide some configuration, from outside to these plugins.
      And we would like to do the same with tika, because we recently encountered: https://issues.apache.org/jira/browse/TIKA-2443 and had to provide a custom type to workaround a mismatched detection.
      There might be other potential mismatches and it would be good to give this possibility to pass configuration to tika from outside our application. Only that for the osgi setup, on classpath means inside the folder plugin.
      and from our point of view, that is not a good place, because these plugins get replaced at every release, so this patching would have to be maintained all the time.
      This is the reason why it would be good, if tika itself had this possibility. Thank you.

      1. TIKA-2460.patch
        4 kB
        Luis Filipe Nassif
      2. TIKA-2460.patch
        2 kB
        Luis Filipe Nassif

        Activity

        Hide
        gagravarr Nick Burch added a comment -

        For $DAYJOB, we've configured Tomcat to have ${catalina.base}/shared/classes as a shared loader folder (in catalina.properties). We then drop machine-specific configuration files and override files into there, so they're picked up by running applications + versioned independently of the deployed apps. Could you not do the same / similar?

        Show
        gagravarr Nick Burch added a comment - For $DAYJOB, we've configured Tomcat to have ${catalina.base}/shared/classes as a shared loader folder (in catalina.properties ). We then drop machine-specific configuration files and override files into there, so they're picked up by running applications + versioned independently of the deployed apps. Could you not do the same / similar?
        Hide
        lfcnassif Luis Filipe Nassif added a comment -

        Hi Nick Burch, I also had this requirement in the past and still have. It is difficult to depend on environment variables from portable apps like ours. Also, we would like to better expose custom-mimetypes.xml to our advanced users, because it is a bit hidden inside org/apache/tika/mime nested folders.

        I think we can offer a java system property, like tika.config, for that. I will prepare a patch and attach it here for your review.

        Show
        lfcnassif Luis Filipe Nassif added a comment - Hi Nick Burch , I also had this requirement in the past and still have. It is difficult to depend on environment variables from portable apps like ours. Also, we would like to better expose custom-mimetypes.xml to our advanced users, because it is a bit hidden inside org/apache/tika/mime nested folders. I think we can offer a java system property, like tika.config, for that. I will prepare a patch and attach it here for your review.
        Hide
        lfcnassif Luis Filipe Nassif added a comment -

        Draft of the patch. Will write unit test after review

        Show
        lfcnassif Luis Filipe Nassif added a comment - Draft of the patch. Will write unit test after review
        Hide
        lfcnassif Luis Filipe Nassif added a comment -

        The patch is missing a null check. Will add together with the unit test.

        Show
        lfcnassif Luis Filipe Nassif added a comment - The patch is missing a null check. Will add together with the unit test.
        Hide
        lfcnassif Luis Filipe Nassif added a comment -

        Final version of the patch, do you have any recommendations Nick Burch?

        Show
        lfcnassif Luis Filipe Nassif added a comment - Final version of the patch, do you have any recommendations Nick Burch ?
        Hide
        gagravarr Nick Burch added a comment -

        I'd tweak the comment to System property to set a path to an addition external custom mimetypes XML file to be loaded, to make it clear that the name doesn't matter, and that it won't affect the loading of any custom-mimetypes.xml files already on the classpath

        Probably also tweak the next comment addition to It will also load custom mimetypes from the system property {@link #CUSTOM_MIMES_SYS_PROP}, if specified

        Finally, if you give a non-existant path to tika.config you get a helpful error message "Specified Tika configuration not found: " + config - should we also add a File exists check + throw a similar specific helpful exception for missing specified mimetype file?

        Show
        gagravarr Nick Burch added a comment - I'd tweak the comment to System property to set a path to an addition external custom mimetypes XML file to be loaded , to make it clear that the name doesn't matter, and that it won't affect the loading of any custom-mimetypes.xml files already on the classpath Probably also tweak the next comment addition to It will also load custom mimetypes from the system property {@link #CUSTOM_MIMES_SYS_PROP}, if specified Finally, if you give a non-existant path to tika.config you get a helpful error message "Specified Tika configuration not found: " + config - should we also add a File exists check + throw a similar specific helpful exception for missing specified mimetype file?
        Hide
        lfcnassif Luis Filipe Nassif added a comment -

        For sure! Will do the adjustments. Thanks!

        Show
        lfcnassif Luis Filipe Nassif added a comment - For sure! Will do the adjustments. Thanks!
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Tika-trunk #1360 (See https://builds.apache.org/job/Tika-trunk/1360/)
        TIKA-2460: load custom mimetypes XML from sys prop (lfcnassif: https://github.com/apache/tika/commit/70ca280f11fe4127df290b8027c6bc1d5180271f)

        • (edit) tika-core/src/main/java/org/apache/tika/mime/MimeTypesFactory.java
        • (edit) CHANGES.txt
        • (edit) tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java
        • (add) tika-core/src/test/resources/org/apache/tika/mime/external-mimetypes.xml
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Tika-trunk #1360 (See https://builds.apache.org/job/Tika-trunk/1360/ ) TIKA-2460 : load custom mimetypes XML from sys prop (lfcnassif: https://github.com/apache/tika/commit/70ca280f11fe4127df290b8027c6bc1d5180271f ) (edit) tika-core/src/main/java/org/apache/tika/mime/MimeTypesFactory.java (edit) CHANGES.txt (edit) tika-core/src/test/java/org/apache/tika/mime/MimeTypesReaderTest.java (add) tika-core/src/test/resources/org/apache/tika/mime/external-mimetypes.xml
        Hide
        lfcnassif Luis Filipe Nassif added a comment -

        Looks like I have no permission to close issues not created by me. Could you take a look Chris A. Mattmann or Dave Meikle?

        Show
        lfcnassif Luis Filipe Nassif added a comment - Looks like I have no permission to close issues not created by me. Could you take a look Chris A. Mattmann or Dave Meikle ?

          People

          • Assignee:
            Unassigned
            Reporter:
            VioricaVisan Viorica Visan
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development