Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1816

Lenient testing for NamedEntityParser

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.13
    • Component/s: parser
    • Labels:

      Description

      NamedEntityParser has a hard setup requirement like downloading of NER models from remote servers and adding them to classpath.
      These model files are huge and hence are not added to source control.
      So, the tests are most likely to fail in various environments.

      Make the best effort to set up the tests, but in the worst case skip tests instead of failing the whole build process.

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user thammegowda opened a pull request:

        https://github.com/apache/tika/pull/68

        FIX for TIKA-1816 by Thamme Gowda

        Lenient testing for `NamedEntityParser`

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/thammegowda/tika TIKA-1816

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tika/pull/68.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #68


        commit 865de584be7cda0ed34c677f5bff5bb87b7a6996
        Author: Thamme Gowda <tgowdan@gmail.com>
        Date: 2015-12-21T01:14:56Z

        Lenient testing for NamedEntityParser


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user thammegowda opened a pull request: https://github.com/apache/tika/pull/68 FIX for TIKA-1816 by Thamme Gowda Lenient testing for `NamedEntityParser` You can merge this pull request into a Git repository by running: $ git pull https://github.com/thammegowda/tika TIKA-1816 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tika/pull/68.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #68 commit 865de584be7cda0ed34c677f5bff5bb87b7a6996 Author: Thamme Gowda <tgowdan@gmail.com> Date: 2015-12-21T01:14:56Z Lenient testing for NamedEntityParser
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tika/pull/68

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/68
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        Thanks Thamme Gowda this should fix unit tests now!

        [chipotle:~/tmp/tika1.12] mattmann% svn commit -m "Fix for TIKA-1816: Lenient testing for NamedEntityParser contributed by Thamme Gowda <tgowdan@gmail.com> this closes #68"
        Sending        CHANGES.txt
        Sending        tika-parsers/src/test/java/org/apache/tika/parser/ner/NamedEntityParserTest.java
        Transmitting file data ..
        Committed revision 1721096.
        [chipotle:~/tmp/tika1.12] mattmann% 
        
        Show
        chrismattmann Chris A. Mattmann added a comment - Thanks Thamme Gowda this should fix unit tests now! [chipotle:~/tmp/tika1.12] mattmann% svn commit -m "Fix for TIKA-1816: Lenient testing for NamedEntityParser contributed by Thamme Gowda <tgowdan@gmail.com> this closes #68" Sending CHANGES.txt Sending tika-parsers/src/test/java/org/apache/tika/parser/ner/NamedEntityParserTest.java Transmitting file data .. Committed revision 1721096. [chipotle:~/tmp/tika1.12] mattmann%
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in tika-trunk-jdk1.7 #895 (See https://builds.apache.org/job/tika-trunk-jdk1.7/895/)
        Fix for TIKA-1816: Lenient testing for NamedEntityParser contributed by Thamme Gowda <tgowdan@gmail.com> this closes #68 (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1721096)

        • trunk/CHANGES.txt
        • trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/NamedEntityParserTest.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in tika-trunk-jdk1.7 #895 (See https://builds.apache.org/job/tika-trunk-jdk1.7/895/ ) Fix for TIKA-1816 : Lenient testing for NamedEntityParser contributed by Thamme Gowda <tgowdan@gmail.com> this closes #68 (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1721096 ) trunk/CHANGES.txt trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/NamedEntityParserTest.java
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Thank you for adding the leniency!

        I've gotten around this for now by manually adding the test files (I'm behind a proxy and can't figure out how to configure the proxy info for the gmaven plugin to perform the get).

        When I did a recent pull of Tika into a fresh directory, I'm still getting the message below. It looks like the issue is with the gmaven-plugin, not just the unit test.

        [INFO] ------------------------------------------------------------------------
        [INFO] Building Apache Tika parsers 1.12-SNAPSHOT
        [INFO] ------------------------------------------------------------------------
        [INFO] 
        [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ tika-parsers ---
        [INFO] 
        [INFO] --- maven-resources-plugin:2.7:resources (default-resources) @ tika-parsers ---
        [INFO] Using 'UTF-8' encoding to copy filtered resources.
        [INFO] Copying 9 resources
        [INFO] Copying 3 resources
        [INFO] 
        [INFO] --- maven-compiler-plugin:3.2:compile (default-compile) @ tika-parsers ---
        [INFO] Changes detected - recompiling the module!
        [INFO] Compiling 195 source files to C:\Users\tallison\Idea Projects\tika-github-test\tika-parsers\target\classes
        [INFO] /C:/Users/tallison/Idea Projects/tika-github-test/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentMetaParser.java: Some input files use or override a deprecated API.
        [INFO] /C:/Users/tallison/Idea Projects/tika-github-test/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentMetaParser.java: Recompile with -Xlint:deprecation for details.
        [INFO] /C:/Users/tallison/Idea Projects/tika-github-test/tika-parsers/src/main/java/org/apache/tika/parser/video/FLVParser.java: Some input files use unchecked or unsafe operations.
        [INFO] /C:/Users/tallison/Idea Projects/tika-github-test/tika-parsers/src/main/java/org/apache/tika/parser/video/FLVParser.java: Recompile with -Xlint:unchecked for details.
        [INFO] 
        [INFO] --- gmaven-plugin:1.0:execute (testSetup) @ tika-parsers ---
        GET : http://opennlp.sourceforge.net/models-1.5/en-ner-person.bin -> tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\ner-person.bin
        [INFO] ------------------------------------------------------------------------
        [INFO] Reactor Summary:
        [INFO] 
        [INFO] Apache Tika parent ................................ SUCCESS [3.780s]
        [INFO] Apache Tika core .................................. SUCCESS [1:34.459s]
        [INFO] Apache Tika parsers ............................... FAILURE [21.944s]
        [INFO] Apache Tika XMP ................................... SKIPPED
        [INFO] Apache Tika serialization ......................... SKIPPED
        [INFO] Apache Tika batch ................................. SKIPPED
        [INFO] Apache Tika application ........................... SKIPPED
        [INFO] Apache Tika OSGi bundle ........................... SKIPPED
        [INFO] Apache Tika translate ............................. SKIPPED
        [INFO] Apache Tika server ................................ SKIPPED
        [INFO] Apache Tika examples .............................. SKIPPED
        [INFO] Apache Tika Java-7 Components ..................... SKIPPED
        [INFO] Apache Tika ....................................... SKIPPED
        [INFO] ------------------------------------------------------------------------
        [INFO] BUILD FAILURE
        [INFO] ------------------------------------------------------------------------
        [INFO] Total time: 2:01.247s
        [INFO] Finished at: Fri Jan 08 09:27:01 EST 2016
        [INFO] Final Memory: 60M/663M
        [INFO] ------------------------------------------------------------------------
        [ERROR] Failed to execute goal org.codehaus.groovy.maven:gmaven-plugin:1.0:execute (testSetup) on project tika-parsers: java.net.ConnectException: Connection refused: connect -> [Help 1]
        [ERROR] 
        [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
        [ERROR] Re-run Maven using the -X switch to enable full debug logging.
        [ERROR] 
        [ERROR] For more information about the errors and possible solutions, please read the following articles:
        [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
        [ERROR] 
        [ERROR] After correcting the problems, you can resume the build with the command
        [ERROR]   mvn <goals> -rf :tika-parsers
        
        Show
        tallison@mitre.org Tim Allison added a comment - Thank you for adding the leniency! I've gotten around this for now by manually adding the test files (I'm behind a proxy and can't figure out how to configure the proxy info for the gmaven plugin to perform the get). When I did a recent pull of Tika into a fresh directory, I'm still getting the message below. It looks like the issue is with the gmaven-plugin, not just the unit test. [INFO] ------------------------------------------------------------------------ [INFO] Building Apache Tika parsers 1.12-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ tika-parsers --- [INFO] [INFO] --- maven-resources-plugin:2.7:resources (default-resources) @ tika-parsers --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 9 resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-compiler-plugin:3.2:compile (default-compile) @ tika-parsers --- [INFO] Changes detected - recompiling the module! [INFO] Compiling 195 source files to C:\Users\tallison\Idea Projects\tika-github-test\tika-parsers\target\classes [INFO] /C:/Users/tallison/Idea Projects/tika-github-test/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentMetaParser.java: Some input files use or override a deprecated API. [INFO] /C:/Users/tallison/Idea Projects/tika-github-test/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentMetaParser.java: Recompile with -Xlint:deprecation for details. [INFO] /C:/Users/tallison/Idea Projects/tika-github-test/tika-parsers/src/main/java/org/apache/tika/parser/video/FLVParser.java: Some input files use unchecked or unsafe operations. [INFO] /C:/Users/tallison/Idea Projects/tika-github-test/tika-parsers/src/main/java/org/apache/tika/parser/video/FLVParser.java: Recompile with -Xlint:unchecked for details. [INFO] [INFO] --- gmaven-plugin:1.0:execute (testSetup) @ tika-parsers --- GET : http://opennlp.sourceforge.net/models-1.5/en-ner-person.bin -> tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\ner-person.bin [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache Tika parent ................................ SUCCESS [3.780s] [INFO] Apache Tika core .................................. SUCCESS [1:34.459s] [INFO] Apache Tika parsers ............................... FAILURE [21.944s] [INFO] Apache Tika XMP ................................... SKIPPED [INFO] Apache Tika serialization ......................... SKIPPED [INFO] Apache Tika batch ................................. SKIPPED [INFO] Apache Tika application ........................... SKIPPED [INFO] Apache Tika OSGi bundle ........................... SKIPPED [INFO] Apache Tika translate ............................. SKIPPED [INFO] Apache Tika server ................................ SKIPPED [INFO] Apache Tika examples .............................. SKIPPED [INFO] Apache Tika Java-7 Components ..................... SKIPPED [INFO] Apache Tika ....................................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 2:01.247s [INFO] Finished at: Fri Jan 08 09:27:01 EST 2016 [INFO] Final Memory: 60M/663M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.codehaus.groovy.maven:gmaven-plugin:1.0:execute (testSetup) on project tika-parsers: java.net.ConnectException: Connection refused: connect -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :tika-parsers
        Hide
        thammegowda Thamme Gowda added a comment - - edited

        Tim Allison Thanks for reporting.
        Please test the provided patch with your proxy setup and let me know if there are any issues.

        you are welcome to do any modifications to the model downloader program. As of now, the downloader uses first active proxy from maven's settings.

        Show
        thammegowda Thamme Gowda added a comment - - edited Tim Allison Thanks for reporting. Please test the provided patch with your proxy setup and let me know if there are any issues. you are welcome to do any modifications to the model downloader program. As of now, the downloader uses first active proxy from maven's settings.
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Y. Works. Thank you!

        Using the first Proxy setting : null@ something.or.other.org : XX
        Proxy is configured
        GET : http://opennlp.sourceforge.net/models-1.5/en-ner-person.bin -> tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\ner-person.bin (Using proxy? true)
        10.2388212797% : 533233 bytes of 5207953
        45.2644829936% : 2357353 bytes of 5207953
        84.3405652854% : 4392417 bytes of 5207953
        Copy complete.
        Download Complete..
        GET : http://opennlp.sourceforge.net/models-1.5/en-ner-location.bin -> tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\ner-location.bin (Using proxy? true)
        40.0848188237% : 2048598 bytes of 5110658
        65.4000717716% : 3342374 bytes of 5110658
        67.4921702841% : 3449294 bytes of 5110658
        Copy complete.
        Download Complete..
        GET : http://opennlp.sourceforge.net/models-1.5/en-ner-organization.bin -> tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\ner-organization.bin (Using proxy? true)
        39.3755384949% : 2085790 bytes of 5297172
        75.4165052598% : 3994942 bytes of 5297172
        Copy complete.
        Download Complete..
        GET : http://opennlp.sourceforge.net/models-1.5/en-ner-date.bin -> tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\ner-date.bin (Using proxy? true)
        43.0595985494% : 2166030 bytes of 5030307
        84.5145634253% : 4251342 bytes of 5030307
        Copy complete.
        Download Complete..
        

        ...snip...

        Running org.apache.tika.parser.ner.NamedEntityParserTest
        11 Jan 2016 09:01:15  INFO NamedEntityParser - going to load, instantiate and bind the instance of org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
        11 Jan 2016 09:01:16  INFO OpenNLPNameFinder - LOCATION NER : Available for service ? true
        11 Jan 2016 09:01:16  INFO OpenNLPNameFinder - ORGANIZATION NER : Available for service ? true
        11 Jan 2016 09:01:17  INFO OpenNLPNameFinder - DATE NER : Available for service ? true
        11 Jan 2016 09:01:17  WARN OpenNLPNameFinder - Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-money.bin using class loader
        11 Jan 2016 09:01:17  INFO OpenNLPNameFinder - MONEY NER : Available for service ? false
        11 Jan 2016 09:01:17  INFO OpenNLPNameFinder - PERSON NER : Available for service ? true
        11 Jan 2016 09:01:17  WARN OpenNLPNameFinder - Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-percentage.bin using class loader
        11 Jan 2016 09:01:17  INFO OpenNLPNameFinder - PERCENT NER : Available for service ? false
        11 Jan 2016 09:01:17  WARN OpenNLPNameFinder - Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-time.bin using class loader
        11 Jan 2016 09:01:17  INFO OpenNLPNameFinder - TIME NER : Available for service ? false
        11 Jan 2016 09:01:17  INFO NamedEntityParser - org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser is available ? true
        11 Jan 2016 09:01:17  INFO NamedEntityParser - going to load, instantiate and bind the instance of org.apache.tika.parser.ner.regex.RegexNERecogniser
        11 Jan 2016 09:01:17  INFO NamedEntityParser - org.apache.tika.parser.ner.regex.RegexNERecogniser is available ? true
        11 Jan 2016 09:01:17  INFO NamedEntityParser - Number of NERecognisers in chain 2
        11 Jan 2016 09:01:17  INFO NamedEntityParser - going to load, instantiate and bind the instance of org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
        11 Jan 2016 09:01:18  INFO OpenNLPNameFinder - LOCATION NER : Available for service ? true
        11 Jan 2016 09:01:18  INFO OpenNLPNameFinder - ORGANIZATION NER : Available for service ? true
        11 Jan 2016 09:01:19  INFO OpenNLPNameFinder - DATE NER : Available for service ? true
        11 Jan 2016 09:01:19  WARN OpenNLPNameFinder - Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-money.bin using class loader
        11 Jan 2016 09:01:19  INFO OpenNLPNameFinder - MONEY NER : Available for service ? false
        11 Jan 2016 09:01:19  INFO OpenNLPNameFinder - PERSON NER : Available for service ? true
        11 Jan 2016 09:01:19  WARN OpenNLPNameFinder - Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-percentage.bin using class loader
        11 Jan 2016 09:01:19  INFO OpenNLPNameFinder - PERCENT NER : Available for service ? false
        11 Jan 2016 09:01:19  WARN OpenNLPNameFinder - Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-time.bin using class loader
        11 Jan 2016 09:01:19  INFO OpenNLPNameFinder - TIME NER : Available for service ? false
        11 Jan 2016 09:01:19  INFO NamedEntityParser - org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser is available ? true
        11 Jan 2016 09:01:19  INFO NamedEntityParser - going to load, instantiate and bind the instance of org.apache.tika.parser.ner.regex.RegexNERecogniser
        11 Jan 2016 09:01:19  INFO NamedEntityParser - org.apache.tika.parser.ner.regex.RegexNERecogniser is available ? true
        11 Jan 2016 09:01:19  INFO NamedEntityParser - Number of NERecognisers in chain 2
        Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.373 sec - in org.apache.tika.parser.ner.NamedEntityParserTest
        Running org.apache.tika.parser.ner.regex.RegexNERecogniserTest
        11 Jan 2016 09:01:19  INFO NamedEntityParser - going to load, instantiate and bind the instance of org.apache.tika.parser.ner.regex.RegexNERecogniser
        11 Jan 2016 09:01:19  INFO NamedEntityParser - org.apache.tika.parser.ner.regex.RegexNERecogniser is available ? true
        11 Jan 2016 09:01:19  INFO NamedEntityParser - Number of NERecognisers in chain 1
        Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.015 sec - in org.apache.tika.parser.ner.regex.RegexNERecogniserTest
        

        And then the tests are run and the build works. Thank you!

        Show
        tallison@mitre.org Tim Allison added a comment - Y. Works. Thank you! Using the first Proxy setting : null@ something.or.other.org : XX Proxy is configured GET : http://opennlp.sourceforge.net/models-1.5/en-ner-person.bin -> tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\ner-person.bin (Using proxy? true) 10.2388212797% : 533233 bytes of 5207953 45.2644829936% : 2357353 bytes of 5207953 84.3405652854% : 4392417 bytes of 5207953 Copy complete. Download Complete.. GET : http://opennlp.sourceforge.net/models-1.5/en-ner-location.bin -> tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\ner-location.bin (Using proxy? true) 40.0848188237% : 2048598 bytes of 5110658 65.4000717716% : 3342374 bytes of 5110658 67.4921702841% : 3449294 bytes of 5110658 Copy complete. Download Complete.. GET : http://opennlp.sourceforge.net/models-1.5/en-ner-organization.bin -> tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\ner-organization.bin (Using proxy? true) 39.3755384949% : 2085790 bytes of 5297172 75.4165052598% : 3994942 bytes of 5297172 Copy complete. Download Complete.. GET : http://opennlp.sourceforge.net/models-1.5/en-ner-date.bin -> tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\ner-date.bin (Using proxy? true) 43.0595985494% : 2166030 bytes of 5030307 84.5145634253% : 4251342 bytes of 5030307 Copy complete. Download Complete.. ...snip... Running org.apache.tika.parser.ner.NamedEntityParserTest 11 Jan 2016 09:01:15 INFO NamedEntityParser - going to load, instantiate and bind the instance of org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser 11 Jan 2016 09:01:16 INFO OpenNLPNameFinder - LOCATION NER : Available for service ? true 11 Jan 2016 09:01:16 INFO OpenNLPNameFinder - ORGANIZATION NER : Available for service ? true 11 Jan 2016 09:01:17 INFO OpenNLPNameFinder - DATE NER : Available for service ? true 11 Jan 2016 09:01:17 WARN OpenNLPNameFinder - Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-money.bin using class loader 11 Jan 2016 09:01:17 INFO OpenNLPNameFinder - MONEY NER : Available for service ? false 11 Jan 2016 09:01:17 INFO OpenNLPNameFinder - PERSON NER : Available for service ? true 11 Jan 2016 09:01:17 WARN OpenNLPNameFinder - Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-percentage.bin using class loader 11 Jan 2016 09:01:17 INFO OpenNLPNameFinder - PERCENT NER : Available for service ? false 11 Jan 2016 09:01:17 WARN OpenNLPNameFinder - Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-time.bin using class loader 11 Jan 2016 09:01:17 INFO OpenNLPNameFinder - TIME NER : Available for service ? false 11 Jan 2016 09:01:17 INFO NamedEntityParser - org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser is available ? true 11 Jan 2016 09:01:17 INFO NamedEntityParser - going to load, instantiate and bind the instance of org.apache.tika.parser.ner.regex.RegexNERecogniser 11 Jan 2016 09:01:17 INFO NamedEntityParser - org.apache.tika.parser.ner.regex.RegexNERecogniser is available ? true 11 Jan 2016 09:01:17 INFO NamedEntityParser - Number of NERecognisers in chain 2 11 Jan 2016 09:01:17 INFO NamedEntityParser - going to load, instantiate and bind the instance of org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser 11 Jan 2016 09:01:18 INFO OpenNLPNameFinder - LOCATION NER : Available for service ? true 11 Jan 2016 09:01:18 INFO OpenNLPNameFinder - ORGANIZATION NER : Available for service ? true 11 Jan 2016 09:01:19 INFO OpenNLPNameFinder - DATE NER : Available for service ? true 11 Jan 2016 09:01:19 WARN OpenNLPNameFinder - Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-money.bin using class loader 11 Jan 2016 09:01:19 INFO OpenNLPNameFinder - MONEY NER : Available for service ? false 11 Jan 2016 09:01:19 INFO OpenNLPNameFinder - PERSON NER : Available for service ? true 11 Jan 2016 09:01:19 WARN OpenNLPNameFinder - Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-percentage.bin using class loader 11 Jan 2016 09:01:19 INFO OpenNLPNameFinder - PERCENT NER : Available for service ? false 11 Jan 2016 09:01:19 WARN OpenNLPNameFinder - Couldn't find model from org/apache/tika/parser/ner/opennlp/ner-time.bin using class loader 11 Jan 2016 09:01:19 INFO OpenNLPNameFinder - TIME NER : Available for service ? false 11 Jan 2016 09:01:19 INFO NamedEntityParser - org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser is available ? true 11 Jan 2016 09:01:19 INFO NamedEntityParser - going to load, instantiate and bind the instance of org.apache.tika.parser.ner.regex.RegexNERecogniser 11 Jan 2016 09:01:19 INFO NamedEntityParser - org.apache.tika.parser.ner.regex.RegexNERecogniser is available ? true 11 Jan 2016 09:01:19 INFO NamedEntityParser - Number of NERecognisers in chain 2 Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.373 sec - in org.apache.tika.parser.ner.NamedEntityParserTest Running org.apache.tika.parser.ner.regex.RegexNERecogniserTest 11 Jan 2016 09:01:19 INFO NamedEntityParser - going to load, instantiate and bind the instance of org.apache.tika.parser.ner.regex.RegexNERecogniser 11 Jan 2016 09:01:19 INFO NamedEntityParser - org.apache.tika.parser.ner.regex.RegexNERecogniser is available ? true 11 Jan 2016 09:01:19 INFO NamedEntityParser - Number of NERecognisers in chain 1 Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.015 sec - in org.apache.tika.parser.ner.regex.RegexNERecogniserTest And then the tests are run and the build works. Thank you!
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Will commit in the next few minutes, unless Chris wants to?

        Show
        tallison@mitre.org Tim Allison added a comment - Will commit in the next few minutes, unless Chris wants to?
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        +1 go for it Tim

        Show
        chrismattmann Chris A. Mattmann added a comment - +1 go for it Tim
        Hide
        tallison@mitre.org Tim Allison added a comment -

        committed in trunk r1724034.

        need to rework ever so slightly to work in 2x.

        Thank you, again, Thamme Gowda!

        Show
        tallison@mitre.org Tim Allison added a comment - committed in trunk r1724034. need to rework ever so slightly to work in 2x. Thank you, again, Thamme Gowda !
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in tika-trunk-jdk1.7 #901 (See https://builds.apache.org/job/tika-trunk-jdk1.7/901/)
        Fix for TIKA-1816: Lenient testing for NamedEntityParser contributed by Thamme Gowda <tgowdan@gmail.com> (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1724034)

        • trunk/tika-parsers/pom.xml
        • trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/ModelGetter.groovy
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in tika-trunk-jdk1.7 #901 (See https://builds.apache.org/job/tika-trunk-jdk1.7/901/ ) Fix for TIKA-1816 : Lenient testing for NamedEntityParser contributed by Thamme Gowda <tgowdan@gmail.com> (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1724034 ) trunk/tika-parsers/pom.xml trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/ModelGetter.groovy
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        -fixed

        Show
        chrismattmann Chris A. Mattmann added a comment - -fixed
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Thamme Gowda, if you have a chance, would you be willing to try your hand at a patch for the 2x branch? I'm not having luck.

        Show
        tallison@mitre.org Tim Allison added a comment - Thamme Gowda , if you have a chance, would you be willing to try your hand at a patch for the 2x branch? I'm not having luck.
        Hide
        thammegowda Thamme Gowda added a comment -

        Tim Allison Sure, I will have a look.

        Correct me if I am wrong (as I was little away from 2.x discussions):
        The NER is now provided by tika-parser-advanced-module, so the tests should be set-up over there, am I correct?

        Show
        thammegowda Thamme Gowda added a comment - Tim Allison Sure, I will have a look. Correct me if I am wrong (as I was little away from 2.x discussions): The NER is now provided by tika-parser-advanced-module , so the tests should be set-up over there, am I correct?
        Hide
        tallison@mitre.org Tim Allison added a comment -

        2.x is still very much in flux! If you We/Bob Paulin initially put the setup code over in tika-test-resources with all of the test-documents for all of the parsers. Given that the test docs for all parsers are there, I think it makes sense to put all of the test infrastructure there. However, if you have any preference to move it back to the advanced module, let us know. 2.x is wide open.

        Show
        tallison@mitre.org Tim Allison added a comment - 2.x is still very much in flux! If you We/ Bob Paulin initially put the setup code over in tika-test-resources with all of the test-documents for all of the parsers. Given that the test docs for all parsers are there, I think it makes sense to put all of the test infrastructure there. However, if you have any preference to move it back to the advanced module, let us know. 2.x is wide open.
        Hide
        bobpaulin Bob Paulin added a comment -

        Thamme Gowda In the 2.x branch all the NERs testing files and scripts/infrastructure is in the tika-test-resources project. This was done because many projects share these documents in the JUnits. The JUnits themselves are not currently shared so those remain in the tika-parsers-advanced-module. Hope this helps. Happy to discuss alternatives if you have feedback.

        Show
        bobpaulin Bob Paulin added a comment - Thamme Gowda In the 2.x branch all the NERs testing files and scripts/infrastructure is in the tika-test-resources project. This was done because many projects share these documents in the JUnits. The JUnits themselves are not currently shared so those remain in the tika-parsers-advanced-module. Hope this helps. Happy to discuss alternatives if you have feedback.
        Hide
        thammegowda Thamme Gowda added a comment -

        Looks Good.

        I just confirmed that tika-test-resources dependency is added to the modules for the test goal. Indeed, this is best!

        Thanks.

        Show
        thammegowda Thamme Gowda added a comment - Looks Good. I just confirmed that tika-test-resources dependency is added to the modules for the test goal. Indeed, this is best! Thanks.
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Reopening until this works in 2.x.

        Show
        tallison@mitre.org Tim Allison added a comment - Reopening until this works in 2.x.
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user thammegowda opened a pull request:

        https://github.com/apache/tika/pull/84

        TIKA1816 : NER model download via maven proxy ( from 1.x to 2.x)

        This PR brings proxy based downloading feature from 1.x branch to 2.x

        Closes TIKA-1816

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/thammegowda/tika 2.x-TIKA-1816

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tika/pull/84.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #84


        commit c4feaff19187f548730f48a77fc437ca12bb40b4
        Author: Thamme Gowda <tgowdan@gmail.com>
        Date: 2016-03-02T09:12:26Z

        Copy Proxy download fix to 2.x


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user thammegowda opened a pull request: https://github.com/apache/tika/pull/84 TIKA1816 : NER model download via maven proxy ( from 1.x to 2.x) This PR brings proxy based downloading feature from 1.x branch to 2.x Closes TIKA-1816 You can merge this pull request into a Git repository by running: $ git pull https://github.com/thammegowda/tika 2.x- TIKA-1816 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tika/pull/84.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #84 commit c4feaff19187f548730f48a77fc437ca12bb40b4 Author: Thamme Gowda <tgowdan@gmail.com> Date: 2016-03-02T09:12:26Z Copy Proxy download fix to 2.x
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tika/pull/84

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/84
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Works locally, at least. Thank you!

        Show
        tallison@mitre.org Tim Allison added a comment - Works locally, at least. Thank you!

          People

          • Assignee:
            Unassigned
            Reporter:
            thammegowda Thamme Gowda
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development