Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2491

Cannot use TikaConfig

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 1.16
    • Fix Version/s: 1.17
    • Component/s: None
    • Labels:
      None

      Description

      I need to use a custom tika-config.xml in Nutch, which has support for it but i can't get it to work.

      This is how Nutch gets the parser:
      Parser parser = tikaConfig.getParser(MediaType.parse(mimeType));

      When no custom config is specified config is:
      new TikaConfig(this.getClass().getClassLoader());

      When i specify a custom config, it is:
      tikaConfig = new TikaConfig(conf.getResource(customConfFile));

      getParser always returns null with a custom config file. There are no errors or exceptions. The config is fine, it fixed the encoding problem in a parser outside of Nutch (thanks again Timothy) but i need to get it to work in Nutch too.

      Our external project does:
      AutoDetectParser parser = new AutoDetectParser(tikaConfig); parser.parse(..);

      and it just works! If i do this in Nutch, however, nothing is passed through the content handlers, the parser result is completely empty?

        Attachments

        1. tika-config.xml
          0.8 kB
          Markus Jelsma

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                markus17 Markus Jelsma
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: