Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2491

Cannot use TikaConfig

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 1.16
    • 1.17
    • None
    • None

    Description

      I need to use a custom tika-config.xml in Nutch, which has support for it but i can't get it to work.

      This is how Nutch gets the parser:
      Parser parser = tikaConfig.getParser(MediaType.parse(mimeType));

      When no custom config is specified config is:
      new TikaConfig(this.getClass().getClassLoader());

      When i specify a custom config, it is:
      tikaConfig = new TikaConfig(conf.getResource(customConfFile));

      getParser always returns null with a custom config file. There are no errors or exceptions. The config is fine, it fixed the encoding problem in a parser outside of Nutch (thanks again Timothy) but i need to get it to work in Nutch too.

      Our external project does:
      AutoDetectParser parser = new AutoDetectParser(tikaConfig); parser.parse(..);

      and it just works! If i do this in Nutch, however, nothing is passed through the content handlers, the parser result is completely empty?

      Attachments

        1. tika-config.xml
          0.8 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              Unassigned Unassigned
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: