Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2490

Turn off stderr warnings in Tika-app

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 1.16
    • Fix Version/s: None
    • Component/s: app
    • Labels:
      None

      Description

      Let's get rid of the stderr messages in tika-app and confirm that users can turn off warnings via tika-config.xml

        Issue Links

          Activity

          Hide
          markus17 Markus Jelsma added a comment -

          Good enough! Thanks!

          Show
          markus17 Markus Jelsma added a comment - Good enough! Thanks!
          Hide
          tallison@mitre.org Tim Allison added a comment - - edited

          I'd add that property and that tika-config.xml to your default settings. That will also pave the way for users to update the tika-config.xml as they wish. The config file will already be there, and it will be "live" because of the added property.

          Show
          tallison@mitre.org Tim Allison added a comment - - edited I'd add that property and that tika-config.xml to your default settings. That will also pave the way for users to update the tika-config.xml as they wish. The config file will already be there, and it will be "live" because of the added property.
          Hide
          markus17 Markus Jelsma added a comment -

          Ok, so what should we do in Nutch. By default, no tika-config.xml is loaded.

          Show
          markus17 Markus Jelsma added a comment - Ok, so what should we do in Nutch. By default, no tika-config.xml is loaded.
          Hide
          tallison@mitre.org Tim Allison added a comment -

          It does work!

          If I add

          <property>
           <name>tika.config.file</name>
           <value>tika-config.xml</value>
          </property>
          

          to nutch-default.xml

          and tika-config.xml consists of:

          <properties>
              <service-loader initializableProblemHandler="ignore"/>
          </properties>
          

          The warning goes away...as it should!

          Show
          tallison@mitre.org Tim Allison added a comment - It does work! If I add <property> <name>tika.config.file</name> <value>tika-config.xml</value> </property> to nutch-default.xml and tika-config.xml consists of: <properties> <service-loader initializableProblemHandler="ignore"/> </properties> The warning goes away...as it should!
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Thank you! I can now reproduce this...at least.

          How are you specifying tika.config.file and where is that file? I'm getting nothing with grep -R initializableProblemHander . or grep -R service-loader .

          Show
          tallison@mitre.org Tim Allison added a comment - Thank you! I can now reproduce this...at least. How are you specifying tika.config.file and where is that file? I'm getting nothing with grep -R initializableProblemHander . or grep -R service-loader .
          Hide
          markus17 Markus Jelsma added a comment -

          I attached a Nutch patch for upgrading to 1.16, modified to work with 1.17-SNAPSHOT.

          Steps to reproduce:

          1. unpack Nutch src
          2. patch -p0 < the patch
          3. rm src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/BoilerpipeExtractorRepository.java
          4. ant
          5. cd runtime/local
          6. bin/nutch indexchecker http://tika.apache.org/
          Show
          markus17 Markus Jelsma added a comment - I attached a Nutch patch for upgrading to 1.16, modified to work with 1.17-SNAPSHOT. Steps to reproduce: unpack Nutch src patch -p0 < the patch rm src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/BoilerpipeExtractorRepository.java ant cd runtime/local bin/nutch indexchecker http://tika.apache.org/
          Hide
          markus17 Markus Jelsma added a comment -

          I still get:

          Nov 14, 2017 1:33:11 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: org.xerial's sqlite-jdbc is not loaded.
          Please provide the jar on your classpath to parse sqlite files.
          See tika-parsers/pom.xml for the correct version.
          

          with our custom parser that is based on Tika 1.17-SNAPSHOT. This is not the built-in TikaParser.

          Show
          markus17 Markus Jelsma added a comment - I still get: Nov 14, 2017 1:33:11 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. with our custom parser that is based on Tika 1.17-SNAPSHOT. This is not the built-in TikaParser.
          Hide
          tallison@mitre.org Tim Allison added a comment - - edited

          Markus Jelsma, are you still getting the warning for sqlite? I haven't gotten a chance to get Nutch up and running in my dev environment, and I can't figure out how you're getting one warning but not the warnings for the image jars... Thank you!

          Show
          tallison@mitre.org Tim Allison added a comment - - edited Markus Jelsma , are you still getting the warning for sqlite? I haven't gotten a chance to get Nutch up and running in my dev environment, and I can't figure out how you're getting one warning but not the warnings for the image jars... Thank you!
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Will do. Let me try to get Nutch up and running in my IDE, and I'll debug the sqlite warning while I'm at it...

          Show
          tallison@mitre.org Tim Allison added a comment - Will do. Let me try to get Nutch up and running in my IDE, and I'll debug the sqlite warning while I'm at it...
          Hide
          markus17 Markus Jelsma added a comment -

          If you have a patch, of course, feel free to open a ticket!

          Show
          markus17 Markus Jelsma added a comment - If you have a patch, of course, feel free to open a ticket!
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Sidenote: Markus Jelsma, um...as I look at the Nutch code and try to reproduce this problem, it looks like your current method wouldn't handle embedded files. This isn't likely a problem for html, but if there's a zip file or an msg file with attachments or https://github.com/apache/tika/blob/master/tika-server/src/test/resources/test_recursive_embedded.docx the users would silently get no content from the embedded documents.

          Should I open a PR?

          Show
          tallison@mitre.org Tim Allison added a comment - Sidenote: Markus Jelsma , um...as I look at the Nutch code and try to reproduce this problem, it looks like your current method wouldn't handle embedded files. This isn't likely a problem for html, but if there's a zip file or an msg file with attachments or https://github.com/apache/tika/blob/master/tika-server/src/test/resources/test_recursive_embedded.docx the users would silently get no content from the embedded documents. Should I open a PR?
          Hide
          markus17 Markus Jelsma added a comment -

          Yes!

          Show
          markus17 Markus Jelsma added a comment - Yes!
          Hide
          tallison@mitre.org Tim Allison added a comment - - edited

          Is customTikaConfig a URL? Sorry found location in Nutch's source. Y, a URL.

          Show
          tallison@mitre.org Tim Allison added a comment - - edited Is customTikaConfig a URL? Sorry found location in Nutch's source. Y, a URL.
          Hide
          markus17 Markus Jelsma added a comment -

          No, old Nutch style:

          tikaConfig = new TikaConfig(customTikaConfig, this.getClass().getClassLoader());
          Parser parser = tikaConfig.getParser(MediaType.parse(mimeType));
          
          Show
          markus17 Markus Jelsma added a comment - No, old Nutch style: tikaConfig = new TikaConfig(customTikaConfig, this .getClass().getClassLoader()); Parser parser = tikaConfig.getParser(MediaType.parse(mimeType));
          Hide
          tallison@mitre.org Tim Allison added a comment -

          No. That shouldn't happen. Are you doing this:

          TikaConfig config = new TikaConfig(customConfFile.toURI().toURL(), this.getClass().getClassLoader());
          Parser p = new AutoDetectParser(config);
          

          Let me try to reproduce this.

          Show
          tallison@mitre.org Tim Allison added a comment - No. That shouldn't happen. Are you doing this: TikaConfig config = new TikaConfig(customConfFile.toURI().toURL(), this.getClass().getClassLoader()); Parser p = new AutoDetectParser(config); Let me try to reproduce this.
          Hide
          markus17 Markus Jelsma added a comment -

          Hello Tim Allison, that works. But we still see:

          Nov 06, 2017 1:40:56 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: org.xerial's sqlite-jdbc is not loaded.
          Please provide the jar on your classpath to parse sqlite files.
          See tika-parsers/pom.xml for the correct version
          

          Is this expected?

          Show
          markus17 Markus Jelsma added a comment - Hello Tim Allison , that works. But we still see: Nov 06, 2017 1:40:56 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version Is this expected?
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Markus Jelsma, give the nightly build a try and let us know if this now works for you: https://builds.apache.org/job/Tika-trunk/1384/

          Show
          tallison@mitre.org Tim Allison added a comment - Markus Jelsma , give the nightly build a try and let us know if this now works for you: https://builds.apache.org/job/Tika-trunk/1384/
          Hide
          tallison@mitre.org Tim Allison added a comment -

          I removed the printing intializable problems to stderr. I fixed what I think is a bug in TikaConfig when initialized via a File, and this now actually works.

          Try this in your tika-config.xml file to turn off any complaints:

          <properties>
              <service-loader initializableProblemHandler="ignore"/>
          </properties>
          

          Note, that if you want throw to work, you also have to set loadErrorHandler to throw.

          Show
          tallison@mitre.org Tim Allison added a comment - I removed the printing intializable problems to stderr. I fixed what I think is a bug in TikaConfig when initialized via a File, and this now actually works. Try this in your tika-config.xml file to turn off any complaints: <properties> <service-loader initializableProblemHandler="ignore"/> </properties> Note, that if you want throw to work, you also have to set loadErrorHandler to throw .
          Show
          tallison@mitre.org Tim Allison added a comment - cc Markus Jelsma , https://lists.apache.org/thread.html/76a0fdc5b26fd3de4a1141a9fa52dcd166a9fbcbfb83cc2a5e459314@%3Cuser.tika.apache.org%3E

            People

            • Assignee:
              tallison@mitre.org Tim Allison
              Reporter:
              tallison@mitre.org Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development