Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2713

Setting classpath for tika-server and opennlp processor models

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.17, 1.18
    • None
    • server
    • None

    Description

      I can't seem to set the classpath for the tika-server so that the opennlp models are detected correctly.

      I've followed the instructions here:

      https://wiki.apache.org/tika/TikaAndNER

      (substituting app for -server, seen as that looked like it contained everything required)

      I have created the following folder structure

      tika
      `-- tika-ner-resources
          `-- org
              `-- apache
                 `-- tika
                    `-- parser
                        `-- ner
                          `-- opennlp
                             |-- ner-location.bin
                             |-- ner-organization.bin
                             `-- ner-person.bin

      Running:

      java -classpath tika/tika-ner-resources -jar tika-server-1.18.jar --config /etc/tika-config.xml -enableUnsecureFeatures -h 0.0.0.0

      and issuing
      {{ curl -v -XPUT --data-binary @test.pdf http://localhost:9998/tika --header "Accept: text/plain" --header "Content-Type: application/pdf"}}

      results in

      INFO going to load, instantiate and bind the instance of org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
      WARN Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-location.bin using class loader
      INFO LOCATION NER : Available for service ? false
      WARN Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-organization.bin using class loader
      INFO ORGANIZATION NER : Available for service ? false
      WARN Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-date.bin using class loader
      INFO DATE NER : Available for service ? false
      WARN Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-money.bin using class loader
      INFO MONEY NER : Available for service ? false
      WARN Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-person.bin using class loader
      INFO PERSON NER : Available for service ? false
      WARN Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-percentage.bin using class loader
      INFO PERCENT NER : Available for service ? false
      WARN Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-time.bin using class loader
      INFO TIME NER : Available for service ? false
      INFO org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser is available ? false
      INFO going to load, instantiate and bind the instance of org.apache.tika.parser.ner.regex.RegexNERecogniser
      INFO org.apache.tika.parser.ner.regex.RegexNERecogniser is available ? false
      INFO Number of NERecognisers in chain 0

       

      The only thing that seems to work is re-packing the jar by adding the contents of the tika/tika-ner-resources directory (i.e. org/blah/blah/*.bin). The curl command then executes without any issues. 

      Does anyone have any ideas ?

       

       

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            badger Badger
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: