Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2298

To improve object recognition parser so that it may work without external RESTful service setup

    Details

    • Flags:
      Patch

      Description

      When ObjectRecognitionParser was built to do image recognition, there wasn't
      good support for Java frameworks. All the popular neural networks were in
      C++ or python. Since there was nothing that runs within JVM, we tried
      several ways to glue them to Tika (like CLI, JNI, gRPC, REST).
      However, this game is changing slowly now. Deeplearning4j, the most famous
      neural network library for JVM, now supports importing models that are
      pre-trained in python/C++ based kits [5].

      Improvement:
      It will be nice to have an implementation of ObjectRecogniser that
      doesn't require any external setup(like installation of native libraries or
      starting REST services). Reasons: easy to distribute and also to cut the IO
      time.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-325745416

          thanks @boegel I'll get this committed today!

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-325745416 thanks @boegel I'll get this committed today! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-325745167

          @chrismattmann Yup, I just increased the "read timeout". PR at #203

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-325745167 @chrismattmann Yup, I just increased the "read timeout". PR at #203 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-325726468

          ping @boegel

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-325726468 ping @boegel ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-325015785

          thanks @boegel if you can submit a PR I'll commit the above, looks like you just increased the max timeout right?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-325015785 thanks @boegel if you can submit a PR I'll commit the above, looks like you just increased the max timeout right? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-324894453

          I was able to dance around the issue with the following patch:

          ```
          — tika-1.16/tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JInceptionV3Net.java.orig 2017-08-25 10:01:28.324036746 +0200
          +++ tika-1.16/tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JInceptionV3Net.java 2017-08-25 10:01:49.534306082 +0200
          @@ -213,7 +213,7 @@
          }
          LOG.info("Cache doesn't exist. Going to make a copy");
          LOG.info("This might take a while! GET {}", uri);

          • FileUtils.copyURLToFile(uri.toURL(), cacheFile, 5000, 5000);
            + FileUtils.copyURLToFile(uri.toURL(), cacheFile, 5000, 50000);
            //restore the success flag again
            FileUtils.write(successFlag,
            "CopiedAt:" + System.currentTimeMillis(),
            ```

          The download of the 90MB `inception-model-weights.h5` was timing out after 5s, which seems a bit tight to me?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-324894453 I was able to dance around the issue with the following patch: ``` — tika-1.16/tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JInceptionV3Net.java.orig 2017-08-25 10:01:28.324036746 +0200 +++ tika-1.16/tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JInceptionV3Net.java 2017-08-25 10:01:49.534306082 +0200 @@ -213,7 +213,7 @@ } LOG.info("Cache doesn't exist. Going to make a copy"); LOG.info("This might take a while! GET {}", uri); FileUtils.copyURLToFile(uri.toURL(), cacheFile, 5000, 5000); + FileUtils.copyURLToFile(uri.toURL(), cacheFile, 5000, 50000); //restore the success flag again FileUtils.write(successFlag, "CopiedAt:" + System.currentTimeMillis(), ``` The download of the 90MB `inception-model-weights.h5` was timing out after 5s, which seems a bit tight to me? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          agibsonccc commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-324821344

          Timeout issues like thse are common. It's usually to do with a VPN or proxy. If you have issues please feel free to come talk to us directly. Thanks!

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - agibsonccc commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-324821344 Timeout issues like thse are common. It's usually to do with a VPN or proxy. If you have issues please feel free to come talk to us directly. Thanks! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-324757211

          @chrismattmann I think it's actually https://raw.githubusercontent.com/USCDataScience/dl4j-kerasimport-examples/98ec48b56a5b8fb7d54a2994ce9cb23bfefac821/dl4j-import-example/data/inception-model-weights.h5, which is a 90MB download...

          Cfr. https://github.com/apache/tika/blob/master/tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-inception3-config.xml#L27 (which is used as input to `TikaConfig`.

          I guess the download is taking too long for some reason?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-324757211 @chrismattmann I think it's actually https://raw.githubusercontent.com/USCDataScience/dl4j-kerasimport-examples/98ec48b56a5b8fb7d54a2994ce9cb23bfefac821/dl4j-import-example/data/inception-model-weights.h5 , which is a 90MB download... Cfr. https://github.com/apache/tika/blob/master/tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-inception3-config.xml#L27 (which is used as input to `TikaConfig`. I guess the download is taking too long for some reason? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-324750562

          @boegel check out: https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-modelimport/src/main/java/org/deeplearning4j/nn/modelimport/keras/trainedmodels/TrainedModels.java looks like it's a Github URL?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-324750562 @boegel check out: https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-modelimport/src/main/java/org/deeplearning4j/nn/modelimport/keras/trainedmodels/TrainedModels.java looks like it's a Github URL? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-324747665

          @chrismattmann No, network works fine... I am behind a firewall though, maybe that's the issue.
          What is the test trying to download exactly, and where can I seed in what it wants?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-324747665 @chrismattmann No, network works fine... I am behind a firewall though, maybe that's the issue. What is the test trying to download exactly, and where can I seed in what it wants? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-324746641

          hi @boegel are you on a computer that doesn't have a net connection? You just need that model to download once...

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-324746641 hi @boegel are you on a computer that doesn't have a net connection? You just need that model to download once... ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-324745266

          I'm trying to build and install Tika 1.16 from source, and I'm running into a failing test; it seems like this test was added in this PR.

          Any pointers to what is wrong here? How can I debug this further?

          ```
          Running org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest
          SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
          SLF4J: Defaulting to no-operation (NOP) logger implementation
          SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
          Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.559 sec <<< FAILURE! - in org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest
          recognise(org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest) Time elapsed: 5.556 sec <<< ERROR!
          org.apache.tika.exception.TikaConfigException: Read timed out
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
          at java.net.SocketInputStream.read(SocketInputStream.java:171)
          at java.net.SocketInputStream.read(SocketInputStream.java:141)
          at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
          at sun.security.ssl.InputRecord.read(InputRecord.java:503)
          at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
          at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
          at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
          at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
          at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
          at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
          at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
          at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
          at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
          at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
          at org.apache.commons.io.FileUtils.copyURLToFile(FileUtils.java:1506)
          at org.apache.tika.dl.imagerec.DL4JInceptionV3Net.cachedDownload(DL4JInceptionV3Net.java:216)
          at org.apache.tika.dl.imagerec.DL4JInceptionV3Net.initialize(DL4JInceptionV3Net.java:232)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:168)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:161)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:157)
          at org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest.recognise(DL4JInceptionV3NetTest.java:33)

          Running org.apache.tika.dl.imagerec.DL4JVGG16NetTest
          Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.6 sec - in org.apache.tika.dl.imagerec.DL4JVGG16NetTest

          Results :

          Tests in error:
          DL4JInceptionV3NetTest.recognise:33 » TikaConfig Read timed out

          Tests run: 2, Failures: 0, Errors: 1, Skipped: 0
          ```

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - boegel commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-324745266 I'm trying to build and install Tika 1.16 from source, and I'm running into a failing test; it seems like this test was added in this PR. Any pointers to what is wrong here? How can I debug this further? ``` Running org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.559 sec <<< FAILURE! - in org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest recognise(org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest) Time elapsed: 5.556 sec <<< ERROR! org.apache.tika.exception.TikaConfigException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254) at org.apache.commons.io.FileUtils.copyURLToFile(FileUtils.java:1506) at org.apache.tika.dl.imagerec.DL4JInceptionV3Net.cachedDownload(DL4JInceptionV3Net.java:216) at org.apache.tika.dl.imagerec.DL4JInceptionV3Net.initialize(DL4JInceptionV3Net.java:232) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:168) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:161) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:157) at org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest.recognise(DL4JInceptionV3NetTest.java:33) Running org.apache.tika.dl.imagerec.DL4JVGG16NetTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.6 sec - in org.apache.tika.dl.imagerec.DL4JVGG16NetTest Results : Tests in error: DL4JInceptionV3NetTest.recognise:33 » TikaConfig Read timed out Tests run: 2, Failures: 0, Errors: 1, Skipped: 0 ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          chrismattmann Chris A. Mattmann added a comment -

          docs added here: https://wiki.apache.org/tika/AgeDetectionParser and linked from front page

          Show
          chrismattmann Chris A. Mattmann added a comment - docs added here: https://wiki.apache.org/tika/AgeDetectionParser and linked from front page
          Hide
          githubbot ASF GitHub Bot added a comment -

          asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-313446688

          Thank you guys!! @chrismattmann @thammegowda @tballison . This is my first merge in a major repository and i am very excited!. once again Thanks !.
          I will surely come up with the documentation soon chris.

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313446688 Thank you guys!! @chrismattmann @thammegowda @tballison . This is my first merge in a major repository and i am very excited!. once again Thanks !. I will surely come up with the documentation soon chris. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          tallison@mitre.org Tim Allison added a comment -

          <face_palm>I knew this was the wrong week to go off coffee...</face_palm>

          Show
          tallison@mitre.org Tim Allison added a comment - <face_palm>I knew this was the wrong week to go off coffee...</face_palm>
          Hide
          chrismattmann Chris A. Mattmann added a comment -

          fixed, was a simple typo - you forgot to set the config object = the new TikaConfig

          Show
          chrismattmann Chris A. Mattmann added a comment - fixed, was a simple typo - you forgot to set the config object = the new TikaConfig
          Hide
          chrismattmann Chris A. Mattmann added a comment -
          Show
          chrismattmann Chris A. Mattmann added a comment - docs added in: https://wiki.apache.org/tika/TikaAndVisionDL4J
          Hide
          chrismattmann Chris A. Mattmann added a comment -

          Tim Allison your latest update causes Jenkins and my local build to fail:

          -------------------------------------------------------
           T E S T S
          -------------------------------------------------------
          Running org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest
          SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
          SLF4J: Defaulting to no-operation (NOP) logger implementation
          SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
          Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.268 sec - in org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest
          Running org.apache.tika.dl.imagerec.DL4JVGG16NetTest
          Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.353 sec <<< FAILURE! - in org.apache.tika.dl.imagerec.DL4JVGG16NetTest
          recognise(org.apache.tika.dl.imagerec.DL4JVGG16NetTest)  Time elapsed: 6.353 sec  <<< ERROR!
          java.lang.NullPointerException: null
          	at org.apache.tika.Tika.<init>(Tika.java:109)
          	at org.apache.tika.dl.imagerec.DL4JVGG16NetTest.recognise(DL4JVGG16NetTest.java:40)
          
          
          Results :
          
          Tests in error: 
            DL4JVGG16NetTest.recognise:40 » NullPointer
          
          Tests run: 2, Failures: 0, Errors: 1, Skipped: 0
          
          [INFO] ------------------------------------------------------------------------
          [INFO] Reactor Summary:
          [INFO] 
          [INFO] Apache Tika parent ................................. SUCCESS [  1.169 s]
          [INFO] Apache Tika core ................................... SUCCESS [ 23.745 s]
          [INFO] Apache Tika parsers ................................ SUCCESS [03:20 min]
          [INFO] Apache Tika XMP .................................... SUCCESS [  1.323 s]
          [INFO] Apache Tika serialization .......................... SUCCESS [  1.114 s]
          [INFO] Apache Tika batch .................................. SUCCESS [01:47 min]
          [INFO] Apache Tika language detection ..................... SUCCESS [  2.683 s]
          [INFO] Apache Tika application ............................ SUCCESS [ 43.016 s]
          [INFO] Apache Tika OSGi bundle ............................ SUCCESS [ 18.439 s]
          [INFO] Apache Tika translate .............................. SUCCESS [  1.794 s]
          [INFO] Apache Tika server ................................. SUCCESS [ 36.437 s]
          [INFO] Apache Tika examples ............................... SUCCESS [  5.494 s]
          [INFO] Apache Tika Java-7 Components ...................... SUCCESS [  1.815 s]
          [INFO] Apache Tika eval ................................... SUCCESS [ 22.354 s]
          [INFO] Apache Tika Deep Learning (powered by DL4J) ........ FAILURE [ 14.242 s]
          [INFO] Apache Tika ........................................ SKIPPED
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD FAILURE
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 08:01 min
          [INFO] Finished at: 2017-07-05T18:33:59-07:00
          [INFO] Final Memory: 126M/1659M
          [INFO] ------------------------------------------------------------------------
          [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project tika-dl: There are test failures.
          [ERROR] 
          [ERROR] Please refer to /Users/mattmann/tmp/tika1.15/tika-dl/target/surefire-reports for the individual test results.
          [ERROR] -> [Help 1]
          [ERROR] 
          [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
          [ERROR] Re-run Maven using the -X switch to enable full debug logging.
          [ERROR] 
          [ERROR] For more information about the errors and possible solutions, please read the following articles:
          [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
          [ERROR] 
          [ERROR] After correcting the problems, you can resume the build with the command
          [ERROR]   mvn <goals> -rf :tika-dl
          LMC-053601:tika1.15 mattmann$ 
          
          

          I'm going to try and fix real quick.

          Show
          chrismattmann Chris A. Mattmann added a comment - Tim Allison your latest update causes Jenkins and my local build to fail: ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.268 sec - in org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest Running org.apache.tika.dl.imagerec.DL4JVGG16NetTest Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.353 sec <<< FAILURE! - in org.apache.tika.dl.imagerec.DL4JVGG16NetTest recognise(org.apache.tika.dl.imagerec.DL4JVGG16NetTest) Time elapsed: 6.353 sec <<< ERROR! java.lang.NullPointerException: null at org.apache.tika.Tika.<init>(Tika.java:109) at org.apache.tika.dl.imagerec.DL4JVGG16NetTest.recognise(DL4JVGG16NetTest.java:40) Results : Tests in error: DL4JVGG16NetTest.recognise:40 » NullPointer Tests run: 2, Failures: 0, Errors: 1, Skipped: 0 [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache Tika parent ................................. SUCCESS [ 1.169 s] [INFO] Apache Tika core ................................... SUCCESS [ 23.745 s] [INFO] Apache Tika parsers ................................ SUCCESS [03:20 min] [INFO] Apache Tika XMP .................................... SUCCESS [ 1.323 s] [INFO] Apache Tika serialization .......................... SUCCESS [ 1.114 s] [INFO] Apache Tika batch .................................. SUCCESS [01:47 min] [INFO] Apache Tika language detection ..................... SUCCESS [ 2.683 s] [INFO] Apache Tika application ............................ SUCCESS [ 43.016 s] [INFO] Apache Tika OSGi bundle ............................ SUCCESS [ 18.439 s] [INFO] Apache Tika translate .............................. SUCCESS [ 1.794 s] [INFO] Apache Tika server ................................. SUCCESS [ 36.437 s] [INFO] Apache Tika examples ............................... SUCCESS [ 5.494 s] [INFO] Apache Tika Java-7 Components ...................... SUCCESS [ 1.815 s] [INFO] Apache Tika eval ................................... SUCCESS [ 22.354 s] [INFO] Apache Tika Deep Learning (powered by DL4J) ........ FAILURE [ 14.242 s] [INFO] Apache Tika ........................................ SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 08:01 min [INFO] Finished at: 2017-07-05T18:33:59-07:00 [INFO] Final Memory: 126M/1659M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project tika-dl: There are test failures. [ERROR] [ERROR] Please refer to /Users/mattmann/tmp/tika1.15/tika-dl/target/surefire-reports for the individual test results. [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :tika-dl LMC-053601:tika1.15 mattmann$ I'm going to try and fix real quick.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Jenkins build Tika-trunk #1310 (See https://builds.apache.org/job/Tika-trunk/1310/)
          TIKA-2298: DL4J-VGG16 simplify conf, implementation (thammegowda: https://github.com/apache/tika/commit/c476ec14efe2d9007f461ecf09ccd2ade4ffc197)

          Show
          hudson Hudson added a comment - FAILURE: Integrated in Jenkins build Tika-trunk #1310 (See https://builds.apache.org/job/Tika-trunk/1310/ ) TIKA-2298 : DL4J-VGG16 simplify conf, implementation (thammegowda: https://github.com/apache/tika/commit/c476ec14efe2d9007f461ecf09ccd2ade4ffc197 ) (edit) tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml (edit) tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java Record change for TIKA-2298 : Very Deep Convolutional Networks for (mattmann: https://github.com/apache/tika/commit/b58cfcf1935d138065eb4a090ba4c1fef17ddacd ) (edit) CHANGES.txt TIKA-2298 – skip test if no network connectivity. Should rework for (tallison: https://github.com/apache/tika/commit/158675def02810d116e7cdab8409c121a88e77eb ) (edit) tika-dl/src/test/java/org/apache/tika/dl/imagerec/DL4JVGG16NetTest.java
          Hide
          chrismattmann Chris A. Mattmann added a comment -

          YES sounds perfect thanks Tim Allison

          Show
          chrismattmann Chris A. Mattmann added a comment - YES sounds perfect thanks Tim Allison
          Hide
          tallison@mitre.org Tim Allison added a comment -

          I'm having the usual proxy problems in my environment with the network call. Mind if I try/catch/swallow TikaConfigurationException with message.contains("Connection refused")

          Show
          tallison@mitre.org Tim Allison added a comment - I'm having the usual proxy problems in my environment with the network call. Mind if I try/catch/swallow TikaConfigurationException with message.contains("Connection refused")
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-313268447

          @asmehra95 and @thammegowda please add a page like https://wiki.apache.org/tika/TikaAndVisionDL4J on the Tika Wiki or add to that page and show how to use the VGG16 model. Should be pretty quick thanks!

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313268447 @asmehra95 and @thammegowda please add a page like https://wiki.apache.org/tika/TikaAndVisionDL4J on the Tika Wiki or add to that page and show how to use the VGG16 model. Should be pretty quick thanks! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          chrismattmann Chris A. Mattmann added a comment -

          Thanks to Avtar Singh and Thamme Gowda and Tim Allison for their help this is now merged into 1.16!

          Show
          chrismattmann Chris A. Mattmann added a comment - Thanks to Avtar Singh and Thamme Gowda and Tim Allison for their help this is now merged into 1.16!
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#discussion_r125792563

          ##########
          File path: tika-dl/pom.xml
          ##########
          @@ -87,6 +87,11 @@
          <artifactId>nd4j-native-platform</artifactId>
          <version>$

          {dl4j.version}

          </version>
          </dependency>
          + <dependency>
          + <groupId>org.apache.commons</groupId>
          + <artifactId>commons-compress</artifactId>
          + <version>1.14</version>

          Review comment:
          fixed in https://github.com/apache/tika/commit/94f8b9fe5fdaebd11a99e76dd742bdc6df427389

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#discussion_r125792563 ########## File path: tika-dl/pom.xml ########## @@ -87,6 +87,11 @@ <artifactId>nd4j-native-platform</artifactId> <version>$ {dl4j.version} </version> </dependency> + <dependency> + <groupId>org.apache.commons</groupId> + <artifactId>commons-compress</artifactId> + <version>1.14</version> Review comment: fixed in https://github.com/apache/tika/commit/94f8b9fe5fdaebd11a99e76dd742bdc6df427389 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann closed pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann closed pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#discussion_r125792055

          ##########
          File path: tika-dl/pom.xml
          ##########
          @@ -87,6 +87,11 @@
          <artifactId>nd4j-native-platform</artifactId>
          <version>$

          {dl4j.version}

          </version>
          </dependency>
          + <dependency>
          + <groupId>org.apache.commons</groupId>
          + <artifactId>commons-compress</artifactId>
          + <version>1.14</version>

          Review comment:
          on it! thanks @tballison

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#discussion_r125792055 ########## File path: tika-dl/pom.xml ########## @@ -87,6 +87,11 @@ <artifactId>nd4j-native-platform</artifactId> <version>$ {dl4j.version} </version> </dependency> + <dependency> + <groupId>org.apache.commons</groupId> + <artifactId>commons-compress</artifactId> + <version>1.14</version> Review comment: on it! thanks @tballison ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-313265776

          OK, I got it working, great job @asmehra95! I am good to merge this into 1.16. Let me double check there are no objections (if so we can back it out).

            1. Build passes

          ```
          [INFO] Loading classes to check...
          [INFO] Scanning classes for violations...
          [INFO] Scanned 2 (and 230 related) class file(s) for forbidden API invocations (in 0.04s), 0 error(s).
          [INFO]
          [INFO] — maven-install-plugin:2.5.2:install (default-install) @ tika-dl —
          [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT.jar to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.jar
          [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/pom.xml to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.pom
          [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 03:48 min
          [INFO] Finished at: 2017-07-05T17:24:47-07:00
          [INFO] Final Memory: 129M/1177M
          [INFO] ------------------------------------------------------------------------
          LMC-053601:tika-dl mattmann$
          ```

            1. Running Lion Image Recognition Test
              ```bash
              $cat test.sh
              java -Xmx3G -cp ./tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar:tika-app/target/tika-app-1.16-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml tika-dl/src/test/resources/org/apache/tika/dl/imagerec/lion.jpg
              ```

          ```bash
          LMC-053601:tika1.15 mattmann$ sh test.sh
          Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The ImageParser will skip jbig2 images
          Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
          See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
          for optional dependencies.
          J2KImageReader not loaded. JPEG2000 files will not be processed.
          See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
          for optional dependencies.

          Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: Tesseract OCR is installed and will be automatically applied to image files.
          This may dramatically slow down content extraction (TIKA-2359).
          As of Tika 1.15 (and prior versions), Tesseract is automatically called.
          In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig.
          Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: org.xerial's sqlite-jdbc is not loaded.
          Please provide the jar on your classpath to parse sqlite files.
          See tika-parsers/pom.xml for the correct version.
          INFO Loaded [CpuBackend] backend
          INFO Number of threads used for NativeOps: 4
          INFO Reflections took 130 ms to scan 1 urls, producing 29 keys and 189 values
          INFO Number of threads used for BLAS: 4
          INFO Backend used: [CPU]; OS: [Mac OS X]
          INFO Cores: [8]; Memory: [2.7GB];
          INFO Blas vendor: [OPENBLAS]
          WARN could not create Vfs.Dir from url. ignoring the exception and continuing
          org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libJ3DAudio.jnilib
          either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
          at org.reflections.Reflections.scan(Reflections.java:237)
          at org.reflections.Reflections.scan(Reflections.java:204)
          at org.reflections.Reflections.<init>(Reflections.java:129)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
          at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
          at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
          at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
          at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
          at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
          WARN could not create Vfs.Dir from url. ignoring the exception and continuing
          org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libAppleScriptEngine.jnilib
          either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
          at org.reflections.Reflections.scan(Reflections.java:237)
          at org.reflections.Reflections.scan(Reflections.java:204)
          at org.reflections.Reflections.<init>(Reflections.java:129)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
          at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
          at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
          at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
          at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
          at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
          WARN could not create Vfs.Dir from url. ignoring the exception and continuing
          org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libJ3D.jnilib
          either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
          at org.reflections.Reflections.scan(Reflections.java:237)
          at org.reflections.Reflections.scan(Reflections.java:204)
          at org.reflections.Reflections.<init>(Reflections.java:129)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
          at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
          at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
          at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
          at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
          at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
          WARN could not create Vfs.Dir from url. ignoring the exception and continuing
          org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/usr/lib/java/libjdns_sd.jnilib
          either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
          at org.reflections.Reflections.scan(Reflections.java:237)
          at org.reflections.Reflections.scan(Reflections.java:204)
          at org.reflections.Reflections.<init>(Reflections.java:129)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
          at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
          at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
          at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
          at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
          at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
          WARN could not create Vfs.Dir from url. ignoring the exception and continuing
          org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libmlib_jai.jnilib
          either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
          at org.reflections.Reflections.scan(Reflections.java:237)
          at org.reflections.Reflections.scan(Reflections.java:204)
          at org.reflections.Reflections.<init>(Reflections.java:129)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
          at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
          at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
          at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
          at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
          at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
          WARN could not create Vfs.Dir from url. ignoring the exception and continuing
          org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libJ3DUtils.jnilib
          either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
          at org.reflections.Reflections.scan(Reflections.java:237)
          at org.reflections.Reflections.scan(Reflections.java:204)
          at org.reflections.Reflections.<init>(Reflections.java:129)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
          at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
          at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
          at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
          at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
          at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
          INFO Reflections took 1529 ms to scan 12 urls, producing 3728 keys and 16714 values
          INFO Preprocessed Model Loaded from /Users/mattmann/.dl4j/trainedmodels/tikaPreprocessed/vgg16.zip
          INFO minConfidence = 0.015, topN=3
          INFO Recogniser = org.apache.tika.dl.imagerec.DL4JVGG16Net
          INFO Recogniser Available = true
          INFO Reflections took 134 ms to scan 1 urls, producing 371 keys and 1443 values
          <?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">
          <head>
          <meta name="org.apache.tika.parser.recognition.object.rec.impl" content="org.apache.tika.dl.imagerec.DL4JVGG16Net"/>
          <meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
          <meta name="X-Parsed-By" content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
          <meta name="resourceName" content="lion.jpg"/>
          <meta name="Content-Length" content="44441"/>
          <meta name="OBJECT" content="lion (0.99999)"/>
          <meta name="Content-Type" content="image/jpeg"/>
          <title/>
          </head>
          <body><ol id="objects"> <li id="lion"> lion [eng](confidence = 0.999988 )</li>
          </ol>
          </body></html>
          ```

          Yay! Works!

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313265776 OK, I got it working, great job @asmehra95! I am good to merge this into 1.16. Let me double check there are no objections (if so we can back it out). Build passes ``` [INFO] Loading classes to check... [INFO] Scanning classes for violations... [INFO] Scanned 2 (and 230 related) class file(s) for forbidden API invocations (in 0.04s), 0 error(s). [INFO] [INFO] — maven-install-plugin:2.5.2:install (default-install) @ tika-dl — [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT.jar to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.jar [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/pom.xml to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.pom [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 03:48 min [INFO] Finished at: 2017-07-05T17:24:47-07:00 [INFO] Final Memory: 129M/1177M [INFO] ------------------------------------------------------------------------ LMC-053601:tika-dl mattmann$ ``` Running Lion Image Recognition Test ```bash $cat test.sh java -Xmx3G -cp ./tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar:tika-app/target/tika-app-1.16-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml tika-dl/src/test/resources/org/apache/tika/dl/imagerec/lion.jpg ``` ```bash LMC-053601:tika1.15 mattmann$ sh test.sh Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The ImageParser will skip jbig2 images Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: Tesseract OCR is installed and will be automatically applied to image files. This may dramatically slow down content extraction ( TIKA-2359 ). As of Tika 1.15 (and prior versions), Tesseract is automatically called. In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig. Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Loaded [CpuBackend] backend INFO Number of threads used for NativeOps: 4 INFO Reflections took 130 ms to scan 1 urls, producing 29 keys and 189 values INFO Number of threads used for BLAS: 4 INFO Backend used: [CPU] ; OS: [Mac OS X] INFO Cores: [8] ; Memory: [2.7GB] ; INFO Blas vendor: [OPENBLAS] WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libJ3DAudio.jnilib either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libAppleScriptEngine.jnilib either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libJ3D.jnilib either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/usr/lib/java/libjdns_sd.jnilib either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libmlib_jai.jnilib either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libJ3DUtils.jnilib either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) INFO Reflections took 1529 ms to scan 12 urls, producing 3728 keys and 16714 values INFO Preprocessed Model Loaded from /Users/mattmann/.dl4j/trainedmodels/tikaPreprocessed/vgg16.zip INFO minConfidence = 0.015, topN=3 INFO Recogniser = org.apache.tika.dl.imagerec.DL4JVGG16Net INFO Recogniser Available = true INFO Reflections took 134 ms to scan 1 urls, producing 371 keys and 1443 values <?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="org.apache.tika.parser.recognition.object.rec.impl" content="org.apache.tika.dl.imagerec.DL4JVGG16Net"/> <meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/> <meta name="X-Parsed-By" content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/> <meta name="resourceName" content="lion.jpg"/> <meta name="Content-Length" content="44441"/> <meta name="OBJECT" content="lion (0.99999)"/> <meta name="Content-Type" content="image/jpeg"/> <title/> </head> <body><ol id="objects"> <li id="lion"> lion [eng] (confidence = 0.999988 )</li> </ol> </body></html> ``` Yay! Works! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          tballison commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#discussion_r125791986

          ##########
          File path: tika-dl/pom.xml
          ##########
          @@ -87,6 +87,11 @@
          <artifactId>nd4j-native-platform</artifactId>
          <version>$

          {dl4j.version}

          </version>
          </dependency>
          + <dependency>
          + <groupId>org.apache.commons</groupId>
          + <artifactId>commons-compress</artifactId>
          + <version>1.14</version>

          Review comment:
          commons.compress.version is set in tika-parent's pom. Reference that here $

          {commons.compress.version}

          so we don't have to worry about coordination/conflicts

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - tballison commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#discussion_r125791986 ########## File path: tika-dl/pom.xml ########## @@ -87,6 +87,11 @@ <artifactId>nd4j-native-platform</artifactId> <version>$ {dl4j.version} </version> </dependency> + <dependency> + <groupId>org.apache.commons</groupId> + <artifactId>commons-compress</artifactId> + <version>1.14</version> Review comment: commons.compress.version is set in tika-parent's pom. Reference that here $ {commons.compress.version} so we don't have to worry about coordination/conflicts ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          thammegowda commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-313267082

          I have not configured maven based on memory so far. It should be possible
          to hack it based on ENV or system property.

          On Jul 5, 2017 5:49 PM, "Chris Mattmann" <notifications@github.com> wrote:

          > @thammegowda <https://github.com/thammegowda> I would say it's OK - do
          > you know if there is a Maven plugin to only run tests if a certain amount
          > of RAM is available? I think I could easily hack this using properties, but
          > just checking first.
          >
          > —
          > You are receiving this because you were mentioned.
          > Reply to this email directly, view it on GitHub
          > <https://github.com/apache/tika/pull/182#issuecomment-313265222>, or mute
          > the thread
          > <https://github.com/notifications/unsubscribe-auth/ABx47MPokVjwxjOwAbrFnnFqiwEEZ_Leks5sLC8igaJpZM4NoZZy>
          > .
          >

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - thammegowda commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313267082 I have not configured maven based on memory so far. It should be possible to hack it based on ENV or system property. On Jul 5, 2017 5:49 PM, "Chris Mattmann" <notifications@github.com> wrote: > @thammegowda < https://github.com/thammegowda > I would say it's OK - do > you know if there is a Maven plugin to only run tests if a certain amount > of RAM is available? I think I could easily hack this using properties, but > just checking first. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > < https://github.com/apache/tika/pull/182#issuecomment-313265222 >, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/ABx47MPokVjwxjOwAbrFnnFqiwEEZ_Leks5sLC8igaJpZM4NoZZy > > . > ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-313265776

          OK, I got it working, great job @asmehra95! I am good to merge this into 1.16. Let me double check there are no objections (if so we can back it out).

          Build passes

          ```
          [INFO] Loading classes to check...
          [INFO] Scanning classes for violations...
          [INFO] Scanned 2 (and 230 related) class file(s) for forbidden API invocations (in 0.04s), 0 error(s).
          [INFO]
          [INFO] — maven-install-plugin:2.5.2:install (default-install) @ tika-dl —
          [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT.jar to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.jar
          [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/pom.xml to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.pom
          [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 03:48 min
          [INFO] Finished at: 2017-07-05T17:24:47-07:00
          [INFO] Final Memory: 129M/1177M
          [INFO] ------------------------------------------------------------------------
          LMC-053601:tika-dl mattmann$
          ```

          Running Lion Image Recognition Test

          ```bash
          $cat test.sh
          java -Xmx3G -cp ./tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar:tika-app/target/tika-app-1.16-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml tika-dl/src/test/resources/org/apache/tika/dl/imagerec/lion.jpg
          ```

          ```bash
          LMC-053601:tika1.15 mattmann$ sh test.sh
          Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The ImageParser will skip jbig2 images
          Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
          See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
          for optional dependencies.
          J2KImageReader not loaded. JPEG2000 files will not be processed.
          See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
          for optional dependencies.

          Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: Tesseract OCR is installed and will be automatically applied to image files.
          This may dramatically slow down content extraction (TIKA-2359).
          As of Tika 1.15 (and prior versions), Tesseract is automatically called.
          In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig.
          Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: org.xerial's sqlite-jdbc is not loaded.
          Please provide the jar on your classpath to parse sqlite files.
          See tika-parsers/pom.xml for the correct version.
          INFO Loaded [CpuBackend] backend
          INFO Number of threads used for NativeOps: 4
          INFO Reflections took 130 ms to scan 1 urls, producing 29 keys and 189 values
          INFO Number of threads used for BLAS: 4
          INFO Backend used: [CPU]; OS: [Mac OS X]
          INFO Cores: [8]; Memory: [2.7GB];
          INFO Blas vendor: [OPENBLAS]
          WARN could not create Vfs.Dir from url. ignoring the exception and continuing
          org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libJ3DAudio.jnilib
          either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
          at org.reflections.Reflections.scan(Reflections.java:237)
          at org.reflections.Reflections.scan(Reflections.java:204)
          at org.reflections.Reflections.<init>(Reflections.java:129)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
          at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
          at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
          at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
          at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
          at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
          WARN could not create Vfs.Dir from url. ignoring the exception and continuing
          org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libAppleScriptEngine.jnilib
          either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
          at org.reflections.Reflections.scan(Reflections.java:237)
          at org.reflections.Reflections.scan(Reflections.java:204)
          at org.reflections.Reflections.<init>(Reflections.java:129)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
          at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
          at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
          at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
          at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
          at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
          WARN could not create Vfs.Dir from url. ignoring the exception and continuing
          org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libJ3D.jnilib
          either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
          at org.reflections.Reflections.scan(Reflections.java:237)
          at org.reflections.Reflections.scan(Reflections.java:204)
          at org.reflections.Reflections.<init>(Reflections.java:129)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
          at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
          at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
          at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
          at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
          at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
          WARN could not create Vfs.Dir from url. ignoring the exception and continuing
          org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/usr/lib/java/libjdns_sd.jnilib
          either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
          at org.reflections.Reflections.scan(Reflections.java:237)
          at org.reflections.Reflections.scan(Reflections.java:204)
          at org.reflections.Reflections.<init>(Reflections.java:129)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
          at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
          at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
          at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
          at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
          at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
          WARN could not create Vfs.Dir from url. ignoring the exception and continuing
          org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libmlib_jai.jnilib
          either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
          at org.reflections.Reflections.scan(Reflections.java:237)
          at org.reflections.Reflections.scan(Reflections.java:204)
          at org.reflections.Reflections.<init>(Reflections.java:129)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
          at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
          at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
          at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
          at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
          at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
          WARN could not create Vfs.Dir from url. ignoring the exception and continuing
          org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libJ3DUtils.jnilib
          either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType.
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:109)
          at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
          at org.reflections.Reflections.scan(Reflections.java:237)
          at org.reflections.Reflections.scan(Reflections.java:204)
          at org.reflections.Reflections.<init>(Reflections.java:129)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376)
          at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123)
          at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456)
          at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356)
          at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135)
          at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684)
          at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
          at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143)
          INFO Reflections took 1529 ms to scan 12 urls, producing 3728 keys and 16714 values
          INFO Preprocessed Model Loaded from /Users/mattmann/.dl4j/trainedmodels/tikaPreprocessed/vgg16.zip
          INFO minConfidence = 0.015, topN=3
          INFO Recogniser = org.apache.tika.dl.imagerec.DL4JVGG16Net
          INFO Recogniser Available = true
          INFO Reflections took 134 ms to scan 1 urls, producing 371 keys and 1443 values
          <?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">
          <head>
          <meta name="org.apache.tika.parser.recognition.object.rec.impl" content="org.apache.tika.dl.imagerec.DL4JVGG16Net"/>
          <meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/>
          <meta name="X-Parsed-By" content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/>
          <meta name="resourceName" content="lion.jpg"/>
          <meta name="Content-Length" content="44441"/>
          <meta name="OBJECT" content="lion (0.99999)"/>
          <meta name="Content-Type" content="image/jpeg"/>
          <title/>
          </head>
          <body><ol id="objects"> <li id="lion"> lion [eng](confidence = 0.999988 )</li>
          </ol>
          </body></html>
          ```

          Yay! Works!

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313265776 OK, I got it working, great job @asmehra95! I am good to merge this into 1.16. Let me double check there are no objections (if so we can back it out). Build passes ``` [INFO] Loading classes to check... [INFO] Scanning classes for violations... [INFO] Scanned 2 (and 230 related) class file(s) for forbidden API invocations (in 0.04s), 0 error(s). [INFO] [INFO] — maven-install-plugin:2.5.2:install (default-install) @ tika-dl — [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT.jar to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.jar [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/pom.xml to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT.pom [INFO] Installing /Users/mattmann/tmp/tika1.15/tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar to /Users/mattmann/.m2/repository/org/apache/tika/tika-dl/1.16-SNAPSHOT/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 03:48 min [INFO] Finished at: 2017-07-05T17:24:47-07:00 [INFO] Final Memory: 129M/1177M [INFO] ------------------------------------------------------------------------ LMC-053601:tika-dl mattmann$ ``` Running Lion Image Recognition Test ```bash $cat test.sh java -Xmx3G -cp ./tika-dl/target/tika-dl-1.16-SNAPSHOT-jar-with-dependencies.jar:tika-app/target/tika-app-1.16-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml tika-dl/src/test/resources/org/apache/tika/dl/imagerec/lion.jpg ``` ```bash LMC-053601:tika1.15 mattmann$ sh test.sh Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The ImageParser will skip jbig2 images Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: Tesseract OCR is installed and will be automatically applied to image files. This may dramatically slow down content extraction ( TIKA-2359 ). As of Tika 1.15 (and prior versions), Tesseract is automatically called. In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig. Jul 05, 2017 5:51:03 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Loaded [CpuBackend] backend INFO Number of threads used for NativeOps: 4 INFO Reflections took 130 ms to scan 1 urls, producing 29 keys and 189 values INFO Number of threads used for BLAS: 4 INFO Backend used: [CPU] ; OS: [Mac OS X] INFO Cores: [8] ; Memory: [2.7GB] ; INFO Blas vendor: [OPENBLAS] WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libJ3DAudio.jnilib either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libAppleScriptEngine.jnilib either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libJ3D.jnilib either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/usr/lib/java/libjdns_sd.jnilib either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libmlib_jai.jnilib either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) WARN could not create Vfs.Dir from url. ignoring the exception and continuing org.reflections.ReflectionsException: could not create Vfs.Dir from url, no matching UrlType was found file:/System/Library/Java/Extensions/libJ3DUtils.jnilib either use fromURL(final URL url, final List<UrlType> urlTypes) or use the static setDefaultURLTypes(final List<UrlType> urlTypes) or addDefaultURLTypes(UrlType urlType) with your specialized UrlType. at org.reflections.vfs.Vfs.fromURL(Vfs.java:109) at org.reflections.vfs.Vfs.fromURL(Vfs.java:91) at org.reflections.Reflections.scan(Reflections.java:237) at org.reflections.Reflections.scan(Reflections.java:204) at org.reflections.Reflections.<init>(Reflections.java:129) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.registerSubtypes(NeuralNetConfiguration.java:431) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.configureMapper(NeuralNetConfiguration.java:386) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.initMapper(NeuralNetConfiguration.java:376) at org.deeplearning4j.nn.conf.NeuralNetConfiguration.<clinit>(NeuralNetConfiguration.java:123) at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:134) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:456) at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:356) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:97) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:164) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:139) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:135) at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:684) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:143) INFO Reflections took 1529 ms to scan 12 urls, producing 3728 keys and 16714 values INFO Preprocessed Model Loaded from /Users/mattmann/.dl4j/trainedmodels/tikaPreprocessed/vgg16.zip INFO minConfidence = 0.015, topN=3 INFO Recogniser = org.apache.tika.dl.imagerec.DL4JVGG16Net INFO Recogniser Available = true INFO Reflections took 134 ms to scan 1 urls, producing 371 keys and 1443 values <?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="org.apache.tika.parser.recognition.object.rec.impl" content="org.apache.tika.dl.imagerec.DL4JVGG16Net"/> <meta name="X-Parsed-By" content="org.apache.tika.parser.CompositeParser"/> <meta name="X-Parsed-By" content="org.apache.tika.parser.recognition.ObjectRecognitionParser"/> <meta name="resourceName" content="lion.jpg"/> <meta name="Content-Length" content="44441"/> <meta name="OBJECT" content="lion (0.99999)"/> <meta name="Content-Type" content="image/jpeg"/> <title/> </head> <body><ol id="objects"> <li id="lion"> lion [eng] (confidence = 0.999988 )</li> </ol> </body></html> ``` Yay! Works! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-313265222

          @thammegowda I would say it's OK - do you know if there is a Maven plugin to only run tests if a certain amount of RAM is available? I think I could easily hack this using properties, but just checking first.

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313265222 @thammegowda I would say it's OK - do you know if there is a Maven plugin to only run tests if a certain amount of RAM is available? I think I could easily hack this using properties, but just checking first. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          thammegowda commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-313261299

          Now the question is, to accommodate this VGG model (with unit tests) we need to increase the memory requirements for Tika build system to 3GB.
          Is this okay to do?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - thammegowda commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313261299 Now the question is, to accommodate this VGG model (with unit tests) we need to increase the memory requirements for Tika build system to 3GB. Is this okay to do? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          thammegowda commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-313261053

          The export statements in bash are not considered by maven.
          Thats because the value set in POM.xml overrides those exports.

          https://github.com/apache/tika/blob/master/tika-parent/pom.xml#L359
          This model requires 3GB and hence the tika-parent should be updated to reflect the same.

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - thammegowda commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313261053 The export statements in bash are not considered by maven. Thats because the value set in POM.xml overrides those exports. https://github.com/apache/tika/blob/master/tika-parent/pom.xml#L359 This model requires 3GB and hence the tika-parent should be updated to reflect the same. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-313252683

          BTW see my branch (I had to fix some errors in compilation along the way):

          https://github.com/apache/tika/compare/master...chrismattmann:TIKA-2298

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313252683 BTW see my branch (I had to fix some errors in compilation along the way): https://github.com/apache/tika/compare/master...chrismattmann:TIKA-2298 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-313252275

          So @asmehra95 @thammegowda I have been testing this out. I can't get the unit tests to pass. See below:

          ```bash
          LMC-053601:tika-dl mattmann$ history | grep export
          546 export MAVEN_OPTS="-Xms2048m"
          548 export MAVEN_OPTS="-Xmx3G"
          550 history | grep export
          LMC-053601:tika-dl mattmann$

          ```

          ```bash
          [INFO]
          [INFO] — maven-resources-plugin:2.7:resources (default-resources) @ tika-dl —
          [INFO] Using 'UTF-8' encoding to copy filtered resources.
          [INFO] Copying 2 resources
          [INFO] Copying 3 resources
          [INFO]
          [INFO] — maven-compiler-plugin:3.2:compile (default-compile) @ tika-dl —
          [INFO] Changes detected - recompiling the module!
          [INFO] Compiling 2 source files to /Users/mattmann/tmp/tika1.15/tika-dl/target/classes
          [INFO]
          [INFO] — maven-resources-plugin:2.7:testResources (default-testResources) @ tika-dl —
          [INFO] Using 'UTF-8' encoding to copy filtered resources.
          [INFO] Copying 4 resources
          [INFO] Copying 3 resources
          [INFO]
          [INFO] — maven-compiler-plugin:3.2:testCompile (default-testCompile) @ tika-dl —
          [INFO] Changes detected - recompiling the module!
          [INFO] Compiling 2 source files to /Users/mattmann/tmp/tika1.15/tika-dl/target/test-classes
          [INFO]
          [INFO] — maven-surefire-plugin:2.18.1:test (default-test) @ tika-dl —
          [INFO] Surefire report directory: /Users/mattmann/tmp/tika1.15/tika-dl/target/surefire-reports

          -------------------------------------------------------
          T E S T S
          -------------------------------------------------------
          Running org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest
          SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
          SLF4J: Defaulting to no-operation (NOP) logger implementation
          SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
          Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.691 sec - in org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest
          Running org.apache.tika.dl.imagerec.DL4JVGG16NetTest
          Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 130.047 sec <<< FAILURE! - in org.apache.tika.dl.imagerec.DL4JVGG16NetTest
          recognise(org.apache.tika.dl.imagerec.DL4JVGG16NetTest) Time elapsed: 130.047 sec <<< ERROR!
          java.lang.OutOfMemoryError: Cannot allocate new FloatPointer(102760448): totalBytes = 1G, physicalBytes = 2G
          at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:568)
          at org.bytedeco.javacpp.Pointer.init(Pointer.java:121)
          at org.bytedeco.javacpp.FloatPointer.allocateArray(Native Method)
          at org.bytedeco.javacpp.FloatPointer.<init>(FloatPointer.java:68)
          at org.nd4j.linalg.api.buffer.BaseDataBuffer.<init>(BaseDataBuffer.java:445)
          at org.nd4j.linalg.api.buffer.FloatBuffer.<init>(FloatBuffer.java:57)
          at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createFloat(DefaultDataBufferFactory.java:236)
          at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1301)
          at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1275)
          at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:252)
          at org.nd4j.linalg.cpu.nativecpu.NDArray.<init>(NDArray.java:109)
          at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:247)
          at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4768)
          at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.toFlattened(CpuNDArrayFactory.java:502)
          at org.nd4j.linalg.factory.BaseNDArrayFactory.toFlattened(BaseNDArrayFactory.java:321)
          at org.nd4j.linalg.factory.Nd4j.toFlattened(Nd4j.java:1846)
          at org.deeplearning4j.nn.weights.WeightInitUtil.initWeights(WeightInitUtil.java:111)
          at org.deeplearning4j.nn.weights.WeightInitUtil.initWeights(WeightInitUtil.java:61)
          at org.deeplearning4j.nn.params.DefaultParamInitializer.createWeightMatrix(DefaultParamInitializer.java:145)
          at org.deeplearning4j.nn.params.DefaultParamInitializer.createWeightMatrix(DefaultParamInitializer.java:133)
          at org.deeplearning4j.nn.params.DefaultParamInitializer.init(DefaultParamInitializer.java:82)
          at org.deeplearning4j.nn.conf.layers.DenseLayer.instantiate(DenseLayer.java:56)
          at org.deeplearning4j.nn.conf.graph.LayerVertex.instantiate(LayerVertex.java:92)
          at org.deeplearning4j.nn.graph.ComputationGraph.init(ComputationGraph.java:370)
          at org.deeplearning4j.nn.graph.ComputationGraph.init(ComputationGraph.java:274)
          at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:483)
          at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:471)
          at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:178)
          at org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper.loadModel(TrainedModelHelper.java:70)
          at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:102)
          at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638)
          at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:168)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:161)
          at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:157)
          at org.apache.tika.dl.imagerec.DL4JVGG16NetTest.recognise(DL4JVGG16NetTest.java:31)

          Results :

          Tests in error:
          DL4JVGG16NetTest.recognise:31 » OutOfMemory Cannot allocate new FloatPointer(1...

          Tests run: 2, Failures: 0, Errors: 1, Skipped: 0

          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD FAILURE
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 02:19 min
          [INFO] Finished at: 2017-07-05T16:06:29-07:00
          [INFO] Final Memory: 61M/1020M
          [INFO] ------------------------------------------------------------------------
          [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project tika-dl: There are test failures.
          [ERROR]
          [ERROR] Please refer to /Users/mattmann/tmp/tika1.15/tika-dl/target/surefire-reports for the individual test results.
          [ERROR] -> [Help 1]
          [ERROR]
          [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
          [ERROR] Re-run Maven using the -X switch to enable full debug logging.
          [ERROR]
          [ERROR] For more information about the errors and possible solutions, please read the following articles:
          [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
          LMC-053601:tika-dl mattmann$
          ```

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313252275 So @asmehra95 @thammegowda I have been testing this out. I can't get the unit tests to pass. See below: ```bash LMC-053601:tika-dl mattmann$ history | grep export 546 export MAVEN_OPTS="-Xms2048m" 548 export MAVEN_OPTS="-Xmx3G" 550 history | grep export LMC-053601:tika-dl mattmann$ ``` ```bash [INFO] [INFO] — maven-resources-plugin:2.7:resources (default-resources) @ tika-dl — [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 2 resources [INFO] Copying 3 resources [INFO] [INFO] — maven-compiler-plugin:3.2:compile (default-compile) @ tika-dl — [INFO] Changes detected - recompiling the module! [INFO] Compiling 2 source files to /Users/mattmann/tmp/tika1.15/tika-dl/target/classes [INFO] [INFO] — maven-resources-plugin:2.7:testResources (default-testResources) @ tika-dl — [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 4 resources [INFO] Copying 3 resources [INFO] [INFO] — maven-compiler-plugin:3.2:testCompile (default-testCompile) @ tika-dl — [INFO] Changes detected - recompiling the module! [INFO] Compiling 2 source files to /Users/mattmann/tmp/tika1.15/tika-dl/target/test-classes [INFO] [INFO] — maven-surefire-plugin:2.18.1:test (default-test) @ tika-dl — [INFO] Surefire report directory: /Users/mattmann/tmp/tika1.15/tika-dl/target/surefire-reports ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.691 sec - in org.apache.tika.dl.imagerec.DL4JInceptionV3NetTest Running org.apache.tika.dl.imagerec.DL4JVGG16NetTest Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 130.047 sec <<< FAILURE! - in org.apache.tika.dl.imagerec.DL4JVGG16NetTest recognise(org.apache.tika.dl.imagerec.DL4JVGG16NetTest) Time elapsed: 130.047 sec <<< ERROR! java.lang.OutOfMemoryError: Cannot allocate new FloatPointer(102760448): totalBytes = 1G, physicalBytes = 2G at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:568) at org.bytedeco.javacpp.Pointer.init(Pointer.java:121) at org.bytedeco.javacpp.FloatPointer.allocateArray(Native Method) at org.bytedeco.javacpp.FloatPointer.<init>(FloatPointer.java:68) at org.nd4j.linalg.api.buffer.BaseDataBuffer.<init>(BaseDataBuffer.java:445) at org.nd4j.linalg.api.buffer.FloatBuffer.<init>(FloatBuffer.java:57) at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createFloat(DefaultDataBufferFactory.java:236) at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1301) at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1275) at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:252) at org.nd4j.linalg.cpu.nativecpu.NDArray.<init>(NDArray.java:109) at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:247) at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4768) at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.toFlattened(CpuNDArrayFactory.java:502) at org.nd4j.linalg.factory.BaseNDArrayFactory.toFlattened(BaseNDArrayFactory.java:321) at org.nd4j.linalg.factory.Nd4j.toFlattened(Nd4j.java:1846) at org.deeplearning4j.nn.weights.WeightInitUtil.initWeights(WeightInitUtil.java:111) at org.deeplearning4j.nn.weights.WeightInitUtil.initWeights(WeightInitUtil.java:61) at org.deeplearning4j.nn.params.DefaultParamInitializer.createWeightMatrix(DefaultParamInitializer.java:145) at org.deeplearning4j.nn.params.DefaultParamInitializer.createWeightMatrix(DefaultParamInitializer.java:133) at org.deeplearning4j.nn.params.DefaultParamInitializer.init(DefaultParamInitializer.java:82) at org.deeplearning4j.nn.conf.layers.DenseLayer.instantiate(DenseLayer.java:56) at org.deeplearning4j.nn.conf.graph.LayerVertex.instantiate(LayerVertex.java:92) at org.deeplearning4j.nn.graph.ComputationGraph.init(ComputationGraph.java:370) at org.deeplearning4j.nn.graph.ComputationGraph.init(ComputationGraph.java:274) at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:483) at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:471) at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:178) at org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper.loadModel(TrainedModelHelper.java:70) at org.apache.tika.dl.imagerec.DL4JVGG16Net.initialize(DL4JVGG16Net.java:102) at org.apache.tika.parser.recognition.ObjectRecognitionParser.initialize(ObjectRecognitionParser.java:101) at org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:638) at org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:550) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:187) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:168) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:161) at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:157) at org.apache.tika.dl.imagerec.DL4JVGG16NetTest.recognise(DL4JVGG16NetTest.java:31) Results : Tests in error: DL4JVGG16NetTest.recognise:31 » OutOfMemory Cannot allocate new FloatPointer(1... Tests run: 2, Failures: 0, Errors: 1, Skipped: 0 [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:19 min [INFO] Finished at: 2017-07-05T16:06:29-07:00 [INFO] Final Memory: 61M/1020M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project tika-dl: There are test failures. [ERROR] [ERROR] Please refer to /Users/mattmann/tmp/tika1.15/tika-dl/target/surefire-reports for the individual test results. [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException LMC-053601:tika-dl mattmann$ ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-313190499

          @thammegowda
          thanks for reply. It would be fine, if it is released in 1.16 but is it working fine?
          A code review would be extremely useful so that i can fix any issues that may be present. This would ensure smooth integrity of this branch in the main branch
          Thanks.

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313190499 @thammegowda thanks for reply. It would be fine, if it is released in 1.16 but is it working fine? A code review would be extremely useful so that i can fix any issues that may be present. This would ensure smooth integrity of this branch in the main branch Thanks. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          thammegowda commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-313128325

          Thanks for pushing the changes.
          We probably have to hold this for the release of tika 1.16.

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - thammegowda commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313128325 Thanks for pushing the changes. We probably have to hold this for the release of tika 1.16. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-313001290

          @chrismattmann @thammegowda any update guys?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-313001290 @chrismattmann @thammegowda any update guys? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-311165331

          hey guys
          @chrismattmann @thammegowda fixed the pending issues please review

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-311165331 hey guys @chrismattmann @thammegowda fixed the pending issues please review ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-310470419

          looks like the PR was merged @thammegowda and @asmehra95 thanks. Let's work on the pending things now and I'll be ready to test when done.

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-310470419 looks like the PR was merged @thammegowda and @asmehra95 thanks. Let's work on the pending things now and I'll be ready to test when done. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          thammegowda commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-310235057

          @asmehra95 Please review the comments.

          Since this PR has been sitting for a long time, I made changes and sent a PR to your branch https://github.com/asmehra95/tika/pull/2
          Please review and merge it (so that the changes will show up here)

            1. Pending:
              1. We need to use the code in`TrainedModels.VGG16.decodePredictions(...)` to support retrieval of `topN` objects. See https://github.com/apache/tika/pull/182#discussion_r123391080
              2. Looks like the confidence is in range [0.0 - 100.0], we need to bring it to the range of [0.0 - 1.0] to make it consistent with other implementations

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - thammegowda commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-310235057 @asmehra95 Please review the comments. Since this PR has been sitting for a long time, I made changes and sent a PR to your branch https://github.com/asmehra95/tika/pull/2 Please review and merge it (so that the changes will show up here) Pending: 1. We need to use the code in`TrainedModels.VGG16.decodePredictions(...)` to support retrieval of `topN` objects. See https://github.com/apache/tika/pull/182#discussion_r123391080 2. Looks like the confidence is in range [0.0 - 100.0] , we need to bring it to the range of [0.0 - 1.0] to make it consistent with other implementations ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          thammegowda commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#discussion_r123379000

          ##########
          File path: tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml
          ##########
          @@ -0,0 +1,32 @@
          +<?xml version="1.0" encoding="UTF-8"?>
          +
          +<!--
          + ~ Licensed to the Apache Software Foundation (ASF) under one or more
          + ~ contributor license agreements. See the NOTICE file distributed with
          + ~ this work for additional information regarding copyright ownership.
          + ~ The ASF licenses this file to You under the Apache License, Version 2.0
          + ~ (the "License"); you may not use this file except in compliance with
          + ~ the License. You may obtain a copy of the License at
          + ~
          + ~ http://www.apache.org/licenses/LICENSE-2.0
          + ~
          + ~ Unless required by applicable law or agreed to in writing, software
          + ~ distributed under the License is distributed on an "AS IS" BASIS,
          + ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + ~ See the License for the specific language governing permissions and
          + ~ limitations under the License.
          + -->
          +<properties>
          + <parsers>
          + <parser class="org.apache.tika.parser.recognition.ObjectRecognitionParser">
          + <mime>image/jpeg</mime>
          + <params>
          + <param name="topN" type="int">2</param>
          + <param name="minConfidence" type="double">0.015</param>
          + <param name="class" type="string">org.apache.tika.dl.imagerec.DL4JVGG16Net</param>
          + <param name="modelType" type="string">VGG16</param>
          + <param name="serialize" type="string">yes</param>

          Review comment:
          Lets make it
          ```xml
          <param name="serialize" type="bool">true</param>
          ```

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - thammegowda commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#discussion_r123379000 ########## File path: tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml ########## @@ -0,0 +1,32 @@ +<?xml version="1.0" encoding="UTF-8"?> + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one or more + ~ contributor license agreements. See the NOTICE file distributed with + ~ this work for additional information regarding copyright ownership. + ~ The ASF licenses this file to You under the Apache License, Version 2.0 + ~ (the "License"); you may not use this file except in compliance with + ~ the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, software + ~ distributed under the License is distributed on an "AS IS" BASIS, + ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + ~ See the License for the specific language governing permissions and + ~ limitations under the License. + --> +<properties> + <parsers> + <parser class="org.apache.tika.parser.recognition.ObjectRecognitionParser"> + <mime>image/jpeg</mime> + <params> + <param name="topN" type="int">2</param> + <param name="minConfidence" type="double">0.015</param> + <param name="class" type="string">org.apache.tika.dl.imagerec.DL4JVGG16Net</param> + <param name="modelType" type="string">VGG16</param> + <param name="serialize" type="string">yes</param> Review comment: Lets make it ```xml <param name="serialize" type="bool">true</param> ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          thammegowda commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#discussion_r123391080

          ##########
          File path: tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java
          ##########
          @@ -0,0 +1,161 @@
          +/**
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + * <p>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.tika.dl.imagerec;
          +
          +import org.apache.tika.config.Field;
          +import org.apache.tika.config.Param;
          +import org.apache.tika.exception.TikaConfigException;
          +import org.apache.tika.exception.TikaException;
          +import org.apache.tika.metadata.Metadata;
          +import org.apache.tika.mime.MediaType;
          +import org.apache.tika.parser.ParseContext;
          +import org.apache.tika.parser.external.ExternalParser;
          +import org.apache.tika.parser.recognition.ObjectRecogniser;
          +import org.apache.tika.parser.recognition.RecognisedObject;
          +import org.datavec.image.loader.NativeImageLoader;
          +import org.deeplearning4j.nn.graph.ComputationGraph;
          +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper;
          +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModels;
          +import org.deeplearning4j.util.ModelSerializer;
          +import org.nd4j.linalg.api.ndarray.INDArray;
          +import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization;
          +import org.nd4j.linalg.dataset.api.preprocessor.VGG16ImagePreProcessor;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +import org.xml.sax.ContentHandler;
          +import org.xml.sax.SAXException;
          +
          +import java.io.File;
          +import java.io.IOException;
          +import java.io.InputStream;
          +import java.util.*;
          +import java.util.regex.Pattern;
          +
          +public class DL4JVGG16Net extends ExternalParser implements ObjectRecogniser {
          +
          + private static final Logger LOG = LoggerFactory.getLogger(DL4JVGG16Net.class);
          + public static final Set<MediaType> SUPPORTED_MIMES = Collections.singleton(MediaType.image("jpeg"));
          + private static final LineConsumer IGNORED_LINE_LOGGER = new LineConsumer() {
          + @Override
          + public void consume(String line)

          { + LOG.debug(line); + }

          + };
          + private static final String HOME_DIR = System.getProperty("user.home");
          + private static final String BASE_DIR = ".dl4j/trainedmodels";
          + private static String MODEL_DIR = HOME_DIR + File.separator + BASE_DIR;
          + private static String MODEL_DIR_PREPROCESSED = MODEL_DIR + File.separator + "tikaPreprocessed" + File.separator;
          + @Field
          + private String modelType = "VGG16";
          + @Field
          + private File modelFile;
          + @Field
          + private String outPattern = "(.*)
          (score = ([0-9]\\.[0-9])
          )$";
          + @Field
          + private String serialize = "yes";
          + private File locationToSave;
          + private boolean available = false;
          + private ComputationGraph model;
          +
          + public Set<MediaType> getSupportedMimes()

          { + return SUPPORTED_MIMES; + }

          +
          + @Override
          + public boolean isAvailable()

          { + return available; + }

          +
          + @Override
          + public void initialize(Map<String, Param> params) throws TikaConfigException {
          + try {
          + TrainedModelHelper helper;
          + switch (modelType)

          { + case "VGG16NOTOP": + throw new TikaConfigException("VGG16NOTOP is not supported right now"); + /*# TODO hookup VGGNOTOP by uncommenting following code once the issue is resolved by dl4j team + modelFile = new File(MODEL_DIR_PREPROCESSED+File.separator+"vgg16_notop.zip"); + locationToSave= new File(MODEL_DIR+File.separator+"tikaPreprocessed"+File.separator+"vgg16.zip"); + helper = new TrainedModelHelper(TrainedModels.VGG16NOTOP); + break;*/ + case "VGG16": + helper = new TrainedModelHelper(TrainedModels.VGG16); + modelFile = new File(MODEL_DIR_PREPROCESSED + File.separator + "vgg16.zip"); + locationToSave = new File(MODEL_DIR + File.separator + "tikaPreprocessed" + File.separator + "vgg16.zip"); + break; + default: + throw new TikaConfigException("Unknown or unsupported model"); + }

          + if (serialize.trim().toLowerCase(Locale.ROOT).equals("yes")) {
          + if (!modelFile.exists()) {
          + LOG.warn("Preprocessed Model doesn't exist at {}", modelFile);
          + modelFile.getParentFile().mkdirs();
          + model = helper.loadModel();
          + LOG.info("Saving the Loaded model for future use. Saved models are more optimised to consume less resources.");
          + ModelSerializer.writeModel(model, locationToSave, true);
          + available = true;
          + } else {
          + model = ModelSerializer.restoreComputationGraph(locationToSave);
          + LOG.info("Preprocessed Model Loaded from {}", locationToSave);
          + available = true;
          + }
          +
          + } else if (serialize.trim().toLowerCase(Locale.ROOT).equals("no"))

          { + LOG.info("Weight graph model loaded via dl4j Helper functions"); + model = helper.loadModel(); + available = true; + }

          else

          { + throw new TikaConfigException("Configuration Error. serialization can be either yes or no."); + }

          +
          + if (!available)

          { + return; + }

          + HashMap<Pattern, String> patterns = new HashMap<>();
          + patterns.put(Pattern.compile(outPattern), null);
          + setMetadataExtractionPatterns(patterns);
          + setIgnoredLineConsumer(IGNORED_LINE_LOGGER);
          + } catch (Exception e)

          { + LOG.warn("exception occured"); + throw new TikaConfigException(e.getMessage(), e); + }

          + }
          +
          + @Override
          + public List<RecognisedObject> recognise(InputStream stream, ContentHandler handler,
          + Metadata metadata, ParseContext context)
          + throws IOException, SAXException, TikaException {
          + NativeImageLoader loader = new NativeImageLoader(224, 224, 3);
          + INDArray image = loader.asMatrix(stream);
          + DataNormalization scaler = new VGG16ImagePreProcessor();
          + scaler.transform(image);
          + INDArray[] output = model.output(false, image);
          + String modelOutput = TrainedModels.VGG16.decodePredictions(output[0]);

          Review comment:
          We need use the code in `TrainedModels.VGG16.decodePredictions(...)` to get `topN` objects

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - thammegowda commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#discussion_r123391080 ########## File path: tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java ########## @@ -0,0 +1,161 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * <p> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tika.dl.imagerec; + +import org.apache.tika.config.Field; +import org.apache.tika.config.Param; +import org.apache.tika.exception.TikaConfigException; +import org.apache.tika.exception.TikaException; +import org.apache.tika.metadata.Metadata; +import org.apache.tika.mime.MediaType; +import org.apache.tika.parser.ParseContext; +import org.apache.tika.parser.external.ExternalParser; +import org.apache.tika.parser.recognition.ObjectRecogniser; +import org.apache.tika.parser.recognition.RecognisedObject; +import org.datavec.image.loader.NativeImageLoader; +import org.deeplearning4j.nn.graph.ComputationGraph; +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper; +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModels; +import org.deeplearning4j.util.ModelSerializer; +import org.nd4j.linalg.api.ndarray.INDArray; +import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization; +import org.nd4j.linalg.dataset.api.preprocessor.VGG16ImagePreProcessor; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.xml.sax.ContentHandler; +import org.xml.sax.SAXException; + +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.util.*; +import java.util.regex.Pattern; + +public class DL4JVGG16Net extends ExternalParser implements ObjectRecogniser { + + private static final Logger LOG = LoggerFactory.getLogger(DL4JVGG16Net.class); + public static final Set<MediaType> SUPPORTED_MIMES = Collections.singleton(MediaType.image("jpeg")); + private static final LineConsumer IGNORED_LINE_LOGGER = new LineConsumer() { + @Override + public void consume(String line) { + LOG.debug(line); + } + }; + private static final String HOME_DIR = System.getProperty("user.home"); + private static final String BASE_DIR = ".dl4j/trainedmodels"; + private static String MODEL_DIR = HOME_DIR + File.separator + BASE_DIR; + private static String MODEL_DIR_PREPROCESSED = MODEL_DIR + File.separator + "tikaPreprocessed" + File.separator; + @Field + private String modelType = "VGG16"; + @Field + private File modelFile; + @Field + private String outPattern = "(.*) (score = ( [0-9] \\. [0-9] ) )$"; + @Field + private String serialize = "yes"; + private File locationToSave; + private boolean available = false; + private ComputationGraph model; + + public Set<MediaType> getSupportedMimes() { + return SUPPORTED_MIMES; + } + + @Override + public boolean isAvailable() { + return available; + } + + @Override + public void initialize(Map<String, Param> params) throws TikaConfigException { + try { + TrainedModelHelper helper; + switch (modelType) { + case "VGG16NOTOP": + throw new TikaConfigException("VGG16NOTOP is not supported right now"); + /*# TODO hookup VGGNOTOP by uncommenting following code once the issue is resolved by dl4j team + modelFile = new File(MODEL_DIR_PREPROCESSED+File.separator+"vgg16_notop.zip"); + locationToSave= new File(MODEL_DIR+File.separator+"tikaPreprocessed"+File.separator+"vgg16.zip"); + helper = new TrainedModelHelper(TrainedModels.VGG16NOTOP); + break;*/ + case "VGG16": + helper = new TrainedModelHelper(TrainedModels.VGG16); + modelFile = new File(MODEL_DIR_PREPROCESSED + File.separator + "vgg16.zip"); + locationToSave = new File(MODEL_DIR + File.separator + "tikaPreprocessed" + File.separator + "vgg16.zip"); + break; + default: + throw new TikaConfigException("Unknown or unsupported model"); + } + if (serialize.trim().toLowerCase(Locale.ROOT).equals("yes")) { + if (!modelFile.exists()) { + LOG.warn("Preprocessed Model doesn't exist at {}", modelFile); + modelFile.getParentFile().mkdirs(); + model = helper.loadModel(); + LOG.info("Saving the Loaded model for future use. Saved models are more optimised to consume less resources."); + ModelSerializer.writeModel(model, locationToSave, true); + available = true; + } else { + model = ModelSerializer.restoreComputationGraph(locationToSave); + LOG.info("Preprocessed Model Loaded from {}", locationToSave); + available = true; + } + + } else if (serialize.trim().toLowerCase(Locale.ROOT).equals("no")) { + LOG.info("Weight graph model loaded via dl4j Helper functions"); + model = helper.loadModel(); + available = true; + } else { + throw new TikaConfigException("Configuration Error. serialization can be either yes or no."); + } + + if (!available) { + return; + } + HashMap<Pattern, String> patterns = new HashMap<>(); + patterns.put(Pattern.compile(outPattern), null); + setMetadataExtractionPatterns(patterns); + setIgnoredLineConsumer(IGNORED_LINE_LOGGER); + } catch (Exception e) { + LOG.warn("exception occured"); + throw new TikaConfigException(e.getMessage(), e); + } + } + + @Override + public List<RecognisedObject> recognise(InputStream stream, ContentHandler handler, + Metadata metadata, ParseContext context) + throws IOException, SAXException, TikaException { + NativeImageLoader loader = new NativeImageLoader(224, 224, 3); + INDArray image = loader.asMatrix(stream); + DataNormalization scaler = new VGG16ImagePreProcessor(); + scaler.transform(image); + INDArray[] output = model.output(false, image); + String modelOutput = TrainedModels.VGG16.decodePredictions(output [0] ); Review comment: We need use the code in `TrainedModels.VGG16.decodePredictions(...)` to get `topN` objects ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          thammegowda commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#discussion_r123390913

          ##########
          File path: tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java
          ##########
          @@ -0,0 +1,161 @@
          +/**
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + * <p>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.tika.dl.imagerec;
          +
          +import org.apache.tika.config.Field;
          +import org.apache.tika.config.Param;
          +import org.apache.tika.exception.TikaConfigException;
          +import org.apache.tika.exception.TikaException;
          +import org.apache.tika.metadata.Metadata;
          +import org.apache.tika.mime.MediaType;
          +import org.apache.tika.parser.ParseContext;
          +import org.apache.tika.parser.external.ExternalParser;
          +import org.apache.tika.parser.recognition.ObjectRecogniser;
          +import org.apache.tika.parser.recognition.RecognisedObject;
          +import org.datavec.image.loader.NativeImageLoader;
          +import org.deeplearning4j.nn.graph.ComputationGraph;
          +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper;
          +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModels;
          +import org.deeplearning4j.util.ModelSerializer;
          +import org.nd4j.linalg.api.ndarray.INDArray;
          +import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization;
          +import org.nd4j.linalg.dataset.api.preprocessor.VGG16ImagePreProcessor;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +import org.xml.sax.ContentHandler;
          +import org.xml.sax.SAXException;
          +
          +import java.io.File;
          +import java.io.IOException;
          +import java.io.InputStream;
          +import java.util.*;
          +import java.util.regex.Pattern;
          +
          +public class DL4JVGG16Net extends ExternalParser implements ObjectRecogniser {

          Review comment:
          I do not think it is actually needed to extend `ExternalParser`

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - thammegowda commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#discussion_r123390913 ########## File path: tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java ########## @@ -0,0 +1,161 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * <p> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tika.dl.imagerec; + +import org.apache.tika.config.Field; +import org.apache.tika.config.Param; +import org.apache.tika.exception.TikaConfigException; +import org.apache.tika.exception.TikaException; +import org.apache.tika.metadata.Metadata; +import org.apache.tika.mime.MediaType; +import org.apache.tika.parser.ParseContext; +import org.apache.tika.parser.external.ExternalParser; +import org.apache.tika.parser.recognition.ObjectRecogniser; +import org.apache.tika.parser.recognition.RecognisedObject; +import org.datavec.image.loader.NativeImageLoader; +import org.deeplearning4j.nn.graph.ComputationGraph; +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper; +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModels; +import org.deeplearning4j.util.ModelSerializer; +import org.nd4j.linalg.api.ndarray.INDArray; +import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization; +import org.nd4j.linalg.dataset.api.preprocessor.VGG16ImagePreProcessor; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.xml.sax.ContentHandler; +import org.xml.sax.SAXException; + +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.util.*; +import java.util.regex.Pattern; + +public class DL4JVGG16Net extends ExternalParser implements ObjectRecogniser { Review comment: I do not think it is actually needed to extend `ExternalParser` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          thammegowda commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#discussion_r123379685

          ##########
          File path: tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java
          ##########
          @@ -0,0 +1,161 @@
          +/**
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + * <p>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.tika.dl.imagerec;
          +
          +import org.apache.tika.config.Field;
          +import org.apache.tika.config.Param;
          +import org.apache.tika.exception.TikaConfigException;
          +import org.apache.tika.exception.TikaException;
          +import org.apache.tika.metadata.Metadata;
          +import org.apache.tika.mime.MediaType;
          +import org.apache.tika.parser.ParseContext;
          +import org.apache.tika.parser.external.ExternalParser;
          +import org.apache.tika.parser.recognition.ObjectRecogniser;
          +import org.apache.tika.parser.recognition.RecognisedObject;
          +import org.datavec.image.loader.NativeImageLoader;
          +import org.deeplearning4j.nn.graph.ComputationGraph;
          +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper;
          +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModels;
          +import org.deeplearning4j.util.ModelSerializer;
          +import org.nd4j.linalg.api.ndarray.INDArray;
          +import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization;
          +import org.nd4j.linalg.dataset.api.preprocessor.VGG16ImagePreProcessor;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +import org.xml.sax.ContentHandler;
          +import org.xml.sax.SAXException;
          +
          +import java.io.File;
          +import java.io.IOException;
          +import java.io.InputStream;
          +import java.util.*;
          +import java.util.regex.Pattern;
          +
          +public class DL4JVGG16Net extends ExternalParser implements ObjectRecogniser {
          +
          + private static final Logger LOG = LoggerFactory.getLogger(DL4JVGG16Net.class);
          + public static final Set<MediaType> SUPPORTED_MIMES = Collections.singleton(MediaType.image("jpeg"));
          + private static final LineConsumer IGNORED_LINE_LOGGER = new LineConsumer() {
          + @Override
          + public void consume(String line)

          { + LOG.debug(line); + }

          + };
          + private static final String HOME_DIR = System.getProperty("user.home");
          + private static final String BASE_DIR = ".dl4j/trainedmodels";
          + private static String MODEL_DIR = HOME_DIR + File.separator + BASE_DIR;
          + private static String MODEL_DIR_PREPROCESSED = MODEL_DIR + File.separator + "tikaPreprocessed" + File.separator;
          + @Field
          + private String modelType = "VGG16";
          + @Field
          + private File modelFile;
          + @Field
          + private String outPattern = "(.*)
          (score = ([0-9]\\.[0-9])
          )$";
          + @Field
          + private String serialize = "yes";
          + private File locationToSave;
          + private boolean available = false;
          + private ComputationGraph model;
          +
          + public Set<MediaType> getSupportedMimes()

          { + return SUPPORTED_MIMES; + }

          +
          + @Override
          + public boolean isAvailable()

          { + return available; + }

          +
          + @Override
          + public void initialize(Map<String, Param> params) throws TikaConfigException {
          + try {
          + TrainedModelHelper helper;
          + switch (modelType)

          { + case "VGG16NOTOP": + throw new TikaConfigException("VGG16NOTOP is not supported right now"); + /*# TODO hookup VGGNOTOP by uncommenting following code once the issue is resolved by dl4j team + modelFile = new File(MODEL_DIR_PREPROCESSED+File.separator+"vgg16_notop.zip"); + locationToSave= new File(MODEL_DIR+File.separator+"tikaPreprocessed"+File.separator+"vgg16.zip"); + helper = new TrainedModelHelper(TrainedModels.VGG16NOTOP); + break;*/ + case "VGG16": + helper = new TrainedModelHelper(TrainedModels.VGG16); + modelFile = new File(MODEL_DIR_PREPROCESSED + File.separator + "vgg16.zip"); + locationToSave = new File(MODEL_DIR + File.separator + "tikaPreprocessed" + File.separator + "vgg16.zip"); + break; + default: + throw new TikaConfigException("Unknown or unsupported model"); + }

          + if (serialize.trim().toLowerCase(Locale.ROOT).equals("yes")) {
          + if (!modelFile.exists()) {
          + LOG.warn("Preprocessed Model doesn't exist at {}", modelFile);
          + modelFile.getParentFile().mkdirs();
          + model = helper.loadModel();
          + LOG.info("Saving the Loaded model for future use. Saved models are more optimised to consume less resources.");
          + ModelSerializer.writeModel(model, locationToSave, true);
          + available = true;
          + } else {
          + model = ModelSerializer.restoreComputationGraph(locationToSave);
          + LOG.info("Preprocessed Model Loaded from {}", locationToSave);
          + available = true;
          + }
          +
          + } else if (serialize.trim().toLowerCase(Locale.ROOT).equals("no"))

          { + LOG.info("Weight graph model loaded via dl4j Helper functions"); + model = helper.loadModel(); + available = true; + }

          else

          { + throw new TikaConfigException("Configuration Error. serialization can be either yes or no."); + }

          +
          + if (!available)

          { + return; + }

          + HashMap<Pattern, String> patterns = new HashMap<>();
          + patterns.put(Pattern.compile(outPattern), null);
          + setMetadataExtractionPatterns(patterns);
          + setIgnoredLineConsumer(IGNORED_LINE_LOGGER);

          Review comment:
          These lines are not needed right?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - thammegowda commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#discussion_r123379685 ########## File path: tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java ########## @@ -0,0 +1,161 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * <p> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tika.dl.imagerec; + +import org.apache.tika.config.Field; +import org.apache.tika.config.Param; +import org.apache.tika.exception.TikaConfigException; +import org.apache.tika.exception.TikaException; +import org.apache.tika.metadata.Metadata; +import org.apache.tika.mime.MediaType; +import org.apache.tika.parser.ParseContext; +import org.apache.tika.parser.external.ExternalParser; +import org.apache.tika.parser.recognition.ObjectRecogniser; +import org.apache.tika.parser.recognition.RecognisedObject; +import org.datavec.image.loader.NativeImageLoader; +import org.deeplearning4j.nn.graph.ComputationGraph; +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper; +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModels; +import org.deeplearning4j.util.ModelSerializer; +import org.nd4j.linalg.api.ndarray.INDArray; +import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization; +import org.nd4j.linalg.dataset.api.preprocessor.VGG16ImagePreProcessor; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.xml.sax.ContentHandler; +import org.xml.sax.SAXException; + +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.util.*; +import java.util.regex.Pattern; + +public class DL4JVGG16Net extends ExternalParser implements ObjectRecogniser { + + private static final Logger LOG = LoggerFactory.getLogger(DL4JVGG16Net.class); + public static final Set<MediaType> SUPPORTED_MIMES = Collections.singleton(MediaType.image("jpeg")); + private static final LineConsumer IGNORED_LINE_LOGGER = new LineConsumer() { + @Override + public void consume(String line) { + LOG.debug(line); + } + }; + private static final String HOME_DIR = System.getProperty("user.home"); + private static final String BASE_DIR = ".dl4j/trainedmodels"; + private static String MODEL_DIR = HOME_DIR + File.separator + BASE_DIR; + private static String MODEL_DIR_PREPROCESSED = MODEL_DIR + File.separator + "tikaPreprocessed" + File.separator; + @Field + private String modelType = "VGG16"; + @Field + private File modelFile; + @Field + private String outPattern = "(.*) (score = ( [0-9] \\. [0-9] ) )$"; + @Field + private String serialize = "yes"; + private File locationToSave; + private boolean available = false; + private ComputationGraph model; + + public Set<MediaType> getSupportedMimes() { + return SUPPORTED_MIMES; + } + + @Override + public boolean isAvailable() { + return available; + } + + @Override + public void initialize(Map<String, Param> params) throws TikaConfigException { + try { + TrainedModelHelper helper; + switch (modelType) { + case "VGG16NOTOP": + throw new TikaConfigException("VGG16NOTOP is not supported right now"); + /*# TODO hookup VGGNOTOP by uncommenting following code once the issue is resolved by dl4j team + modelFile = new File(MODEL_DIR_PREPROCESSED+File.separator+"vgg16_notop.zip"); + locationToSave= new File(MODEL_DIR+File.separator+"tikaPreprocessed"+File.separator+"vgg16.zip"); + helper = new TrainedModelHelper(TrainedModels.VGG16NOTOP); + break;*/ + case "VGG16": + helper = new TrainedModelHelper(TrainedModels.VGG16); + modelFile = new File(MODEL_DIR_PREPROCESSED + File.separator + "vgg16.zip"); + locationToSave = new File(MODEL_DIR + File.separator + "tikaPreprocessed" + File.separator + "vgg16.zip"); + break; + default: + throw new TikaConfigException("Unknown or unsupported model"); + } + if (serialize.trim().toLowerCase(Locale.ROOT).equals("yes")) { + if (!modelFile.exists()) { + LOG.warn("Preprocessed Model doesn't exist at {}", modelFile); + modelFile.getParentFile().mkdirs(); + model = helper.loadModel(); + LOG.info("Saving the Loaded model for future use. Saved models are more optimised to consume less resources."); + ModelSerializer.writeModel(model, locationToSave, true); + available = true; + } else { + model = ModelSerializer.restoreComputationGraph(locationToSave); + LOG.info("Preprocessed Model Loaded from {}", locationToSave); + available = true; + } + + } else if (serialize.trim().toLowerCase(Locale.ROOT).equals("no")) { + LOG.info("Weight graph model loaded via dl4j Helper functions"); + model = helper.loadModel(); + available = true; + } else { + throw new TikaConfigException("Configuration Error. serialization can be either yes or no."); + } + + if (!available) { + return; + } + HashMap<Pattern, String> patterns = new HashMap<>(); + patterns.put(Pattern.compile(outPattern), null); + setMetadataExtractionPatterns(patterns); + setIgnoredLineConsumer(IGNORED_LINE_LOGGER); Review comment: These lines are not needed right? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          thammegowda commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#discussion_r123379864

          ##########
          File path: tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java
          ##########
          @@ -0,0 +1,161 @@
          +/**
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + * <p>
          + * http://www.apache.org/licenses/LICENSE-2.0
          + * <p>
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.tika.dl.imagerec;
          +
          +import org.apache.tika.config.Field;
          +import org.apache.tika.config.Param;
          +import org.apache.tika.exception.TikaConfigException;
          +import org.apache.tika.exception.TikaException;
          +import org.apache.tika.metadata.Metadata;
          +import org.apache.tika.mime.MediaType;
          +import org.apache.tika.parser.ParseContext;
          +import org.apache.tika.parser.external.ExternalParser;
          +import org.apache.tika.parser.recognition.ObjectRecogniser;
          +import org.apache.tika.parser.recognition.RecognisedObject;
          +import org.datavec.image.loader.NativeImageLoader;
          +import org.deeplearning4j.nn.graph.ComputationGraph;
          +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper;
          +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModels;
          +import org.deeplearning4j.util.ModelSerializer;
          +import org.nd4j.linalg.api.ndarray.INDArray;
          +import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization;
          +import org.nd4j.linalg.dataset.api.preprocessor.VGG16ImagePreProcessor;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +import org.xml.sax.ContentHandler;
          +import org.xml.sax.SAXException;
          +
          +import java.io.File;
          +import java.io.IOException;
          +import java.io.InputStream;
          +import java.util.*;
          +import java.util.regex.Pattern;
          +
          +public class DL4JVGG16Net extends ExternalParser implements ObjectRecogniser {
          +
          + private static final Logger LOG = LoggerFactory.getLogger(DL4JVGG16Net.class);
          + public static final Set<MediaType> SUPPORTED_MIMES = Collections.singleton(MediaType.image("jpeg"));
          + private static final LineConsumer IGNORED_LINE_LOGGER = new LineConsumer() {
          + @Override
          + public void consume(String line)

          { + LOG.debug(line); + }

          + };
          + private static final String HOME_DIR = System.getProperty("user.home");
          + private static final String BASE_DIR = ".dl4j/trainedmodels";
          + private static String MODEL_DIR = HOME_DIR + File.separator + BASE_DIR;
          + private static String MODEL_DIR_PREPROCESSED = MODEL_DIR + File.separator + "tikaPreprocessed" + File.separator;
          + @Field
          + private String modelType = "VGG16";
          + @Field
          + private File modelFile;
          + @Field
          + private String outPattern = "(.*)
          (score = ([0-9]\\.[0-9])
          )$";
          + @Field
          + private String serialize = "yes";
          + private File locationToSave;
          + private boolean available = false;
          + private ComputationGraph model;
          +
          + public Set<MediaType> getSupportedMimes()

          { + return SUPPORTED_MIMES; + }

          +
          + @Override
          + public boolean isAvailable()

          { + return available; + }

          +
          + @Override
          + public void initialize(Map<String, Param> params) throws TikaConfigException {
          + try {
          + TrainedModelHelper helper;
          + switch (modelType)

          { + case "VGG16NOTOP": + throw new TikaConfigException("VGG16NOTOP is not supported right now"); + /*# TODO hookup VGGNOTOP by uncommenting following code once the issue is resolved by dl4j team + modelFile = new File(MODEL_DIR_PREPROCESSED+File.separator+"vgg16_notop.zip"); + locationToSave= new File(MODEL_DIR+File.separator+"tikaPreprocessed"+File.separator+"vgg16.zip"); + helper = new TrainedModelHelper(TrainedModels.VGG16NOTOP); + break;*/ + case "VGG16": + helper = new TrainedModelHelper(TrainedModels.VGG16); + modelFile = new File(MODEL_DIR_PREPROCESSED + File.separator + "vgg16.zip"); + locationToSave = new File(MODEL_DIR + File.separator + "tikaPreprocessed" + File.separator + "vgg16.zip"); + break; + default: + throw new TikaConfigException("Unknown or unsupported model"); + }

          + if (serialize.trim().toLowerCase(Locale.ROOT).equals("yes")) {
          + if (!modelFile.exists()) {
          + LOG.warn("Preprocessed Model doesn't exist at {}", modelFile);
          + modelFile.getParentFile().mkdirs();
          + model = helper.loadModel();
          + LOG.info("Saving the Loaded model for future use. Saved models are more optimised to consume less resources.");
          + ModelSerializer.writeModel(model, locationToSave, true);
          + available = true;
          + } else {
          + model = ModelSerializer.restoreComputationGraph(locationToSave);
          + LOG.info("Preprocessed Model Loaded from {}", locationToSave);
          + available = true;
          + }
          +
          + } else if (serialize.trim().toLowerCase(Locale.ROOT).equals("no"))

          { + LOG.info("Weight graph model loaded via dl4j Helper functions"); + model = helper.loadModel(); + available = true; + }

          else

          { + throw new TikaConfigException("Configuration Error. serialization can be either yes or no."); + }

          +
          + if (!available)

          { + return; + }

          + HashMap<Pattern, String> patterns = new HashMap<>();
          + patterns.put(Pattern.compile(outPattern), null);
          + setMetadataExtractionPatterns(patterns);
          + setIgnoredLineConsumer(IGNORED_LINE_LOGGER);
          + } catch (Exception e)

          { + LOG.warn("exception occured"); + throw new TikaConfigException(e.getMessage(), e); + }

          + }
          +
          + @Override
          + public List<RecognisedObject> recognise(InputStream stream, ContentHandler handler,
          + Metadata metadata, ParseContext context)
          + throws IOException, SAXException, TikaException {
          + NativeImageLoader loader = new NativeImageLoader(224, 224, 3);

          Review comment:
          Can the `loader` object be declared as a member variable and then reused it here?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - thammegowda commented on a change in pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#discussion_r123379864 ########## File path: tika-dl/src/main/java/org/apache/tika/dl/imagerec/DL4JVGG16Net.java ########## @@ -0,0 +1,161 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * <p> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.tika.dl.imagerec; + +import org.apache.tika.config.Field; +import org.apache.tika.config.Param; +import org.apache.tika.exception.TikaConfigException; +import org.apache.tika.exception.TikaException; +import org.apache.tika.metadata.Metadata; +import org.apache.tika.mime.MediaType; +import org.apache.tika.parser.ParseContext; +import org.apache.tika.parser.external.ExternalParser; +import org.apache.tika.parser.recognition.ObjectRecogniser; +import org.apache.tika.parser.recognition.RecognisedObject; +import org.datavec.image.loader.NativeImageLoader; +import org.deeplearning4j.nn.graph.ComputationGraph; +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModelHelper; +import org.deeplearning4j.nn.modelimport.keras.trainedmodels.TrainedModels; +import org.deeplearning4j.util.ModelSerializer; +import org.nd4j.linalg.api.ndarray.INDArray; +import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization; +import org.nd4j.linalg.dataset.api.preprocessor.VGG16ImagePreProcessor; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.xml.sax.ContentHandler; +import org.xml.sax.SAXException; + +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.util.*; +import java.util.regex.Pattern; + +public class DL4JVGG16Net extends ExternalParser implements ObjectRecogniser { + + private static final Logger LOG = LoggerFactory.getLogger(DL4JVGG16Net.class); + public static final Set<MediaType> SUPPORTED_MIMES = Collections.singleton(MediaType.image("jpeg")); + private static final LineConsumer IGNORED_LINE_LOGGER = new LineConsumer() { + @Override + public void consume(String line) { + LOG.debug(line); + } + }; + private static final String HOME_DIR = System.getProperty("user.home"); + private static final String BASE_DIR = ".dl4j/trainedmodels"; + private static String MODEL_DIR = HOME_DIR + File.separator + BASE_DIR; + private static String MODEL_DIR_PREPROCESSED = MODEL_DIR + File.separator + "tikaPreprocessed" + File.separator; + @Field + private String modelType = "VGG16"; + @Field + private File modelFile; + @Field + private String outPattern = "(.*) (score = ( [0-9] \\. [0-9] ) )$"; + @Field + private String serialize = "yes"; + private File locationToSave; + private boolean available = false; + private ComputationGraph model; + + public Set<MediaType> getSupportedMimes() { + return SUPPORTED_MIMES; + } + + @Override + public boolean isAvailable() { + return available; + } + + @Override + public void initialize(Map<String, Param> params) throws TikaConfigException { + try { + TrainedModelHelper helper; + switch (modelType) { + case "VGG16NOTOP": + throw new TikaConfigException("VGG16NOTOP is not supported right now"); + /*# TODO hookup VGGNOTOP by uncommenting following code once the issue is resolved by dl4j team + modelFile = new File(MODEL_DIR_PREPROCESSED+File.separator+"vgg16_notop.zip"); + locationToSave= new File(MODEL_DIR+File.separator+"tikaPreprocessed"+File.separator+"vgg16.zip"); + helper = new TrainedModelHelper(TrainedModels.VGG16NOTOP); + break;*/ + case "VGG16": + helper = new TrainedModelHelper(TrainedModels.VGG16); + modelFile = new File(MODEL_DIR_PREPROCESSED + File.separator + "vgg16.zip"); + locationToSave = new File(MODEL_DIR + File.separator + "tikaPreprocessed" + File.separator + "vgg16.zip"); + break; + default: + throw new TikaConfigException("Unknown or unsupported model"); + } + if (serialize.trim().toLowerCase(Locale.ROOT).equals("yes")) { + if (!modelFile.exists()) { + LOG.warn("Preprocessed Model doesn't exist at {}", modelFile); + modelFile.getParentFile().mkdirs(); + model = helper.loadModel(); + LOG.info("Saving the Loaded model for future use. Saved models are more optimised to consume less resources."); + ModelSerializer.writeModel(model, locationToSave, true); + available = true; + } else { + model = ModelSerializer.restoreComputationGraph(locationToSave); + LOG.info("Preprocessed Model Loaded from {}", locationToSave); + available = true; + } + + } else if (serialize.trim().toLowerCase(Locale.ROOT).equals("no")) { + LOG.info("Weight graph model loaded via dl4j Helper functions"); + model = helper.loadModel(); + available = true; + } else { + throw new TikaConfigException("Configuration Error. serialization can be either yes or no."); + } + + if (!available) { + return; + } + HashMap<Pattern, String> patterns = new HashMap<>(); + patterns.put(Pattern.compile(outPattern), null); + setMetadataExtractionPatterns(patterns); + setIgnoredLineConsumer(IGNORED_LINE_LOGGER); + } catch (Exception e) { + LOG.warn("exception occured"); + throw new TikaConfigException(e.getMessage(), e); + } + } + + @Override + public List<RecognisedObject> recognise(InputStream stream, ContentHandler handler, + Metadata metadata, ParseContext context) + throws IOException, SAXException, TikaException { + NativeImageLoader loader = new NativeImageLoader(224, 224, 3); Review comment: Can the `loader` object be declared as a member variable and then reused it here? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          thammegowda commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-310216712

          @asmehra95 Sorry for the delay (vacations!). Reviewing it today

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - thammegowda commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-310216712 @asmehra95 Sorry for the delay (vacations!). Reviewing it today ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-306096976

          @chrismattmann ping.. any update?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-306096976 @chrismattmann ping.. any update? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-304462311

          @thammegowda @chrismattmann awaiting review for this pull request...

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - asmehra95 commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-304462311 @thammegowda @chrismattmann awaiting review for this pull request... ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182#issuecomment-304462094

          frickin' awesome! I'm going to test this today @asmehra95

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182#issuecomment-304462094 frickin' awesome! I'm going to test this today @asmehra95 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann closed pull request #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
          URL: https://github.com/apache/tika/pull/159

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann closed pull request #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j URL: https://github.com/apache/tika/pull/159 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
          URL: https://github.com/apache/tika/pull/159#issuecomment-304461966

          superseded by #182

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j URL: https://github.com/apache/tika/pull/159#issuecomment-304461966 superseded by #182 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          asmehra95 opened a new pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl
          URL: https://github.com/apache/tika/pull/182

          <b>Note:</b> This is a modified form of #159 raised earlier by me.
          I have imported VGG16 model into tika-dl module using deeplearning4j .
          The usage of this recogniser is very similar to TensorFlowRESTrecogniser but it doesn't require any external setup, like running RESTservice in as in case of TensorFlowRESTrecogniser.
          You can read more about TensorFlowRESTrecogniser at https://wiki.apache.org/tika/TikaAndVision

          To use the DL4JVGG16Net set
          class param to org.apache.tika.dl.imagerec.DL4JVGG16Net
          modelType to VGG16
          sample configuration is given below for refference.

          ```
          <?xml version="1.0" encoding="UTF-8"?>
          <properties>
          <parsers>
          <parser class="org.apache.tika.parser.recognition.ObjectRecognitionParser">
          <mime>image/jpeg</mime>
          <params>
          <param name="topN" type="int">2</param>
          <param name="minConfidence" type="double">0.015</param>
          <param name="class" type="string">org.apache.tika.dl.imagerec.DL4JVGG16Net</param>
          <param name="modelType" type="string">VGG16</param>
          <param name="serialize" type="string">yes</param>
          </params>
          </parser>
          </parsers>
          </properties>
          ```
          Save the configuration at your preffered location.
          A default one is provided at ``` tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml ```

          To run it in default configuration, build the project and move to root directory of the project and run the command.

          '``` java -Xmx3G -cp ./tika-dl/target/tika-dl-1.15-SNAPSHOT-jar-with-dependencies.jar;tika-app/target/tika-app-1.15-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml tika-dl/src/test/resources/org/apache/tika/dl/imagerec/lion.jpg```
          -Xmx3G is required because VGG16 model requires quite a lot of memory to run.
          Observations:
          When loading searilized model from disk:
          It only require around 1200mb of ram to run.

          When model is loaded from h5 files using helper functions
          It requires 2500mb of ram to run the model (required only one time if serialization is set to yes)

          Once the model runs, it automatically downloads the model file using helper functions of DL4J locally at .dl4j/trainedModels
          To speed up the process in future, once the model is loaded from original hash files, it is serialized and saved on disk at .dl4j/trainedModels/tikaPreprocessed which significantly reduces
          the resource usage (specially memory consumption) for future loads.
          Issue Link:
          https://issues.apache.org/jira/browse/TIKA-2298

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - asmehra95 opened a new pull request #182: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j into tika-dl URL: https://github.com/apache/tika/pull/182 <b>Note:</b> This is a modified form of #159 raised earlier by me. I have imported VGG16 model into tika-dl module using deeplearning4j . The usage of this recogniser is very similar to TensorFlowRESTrecogniser but it doesn't require any external setup, like running RESTservice in as in case of TensorFlowRESTrecogniser. You can read more about TensorFlowRESTrecogniser at https://wiki.apache.org/tika/TikaAndVision To use the DL4JVGG16Net set class param to org.apache.tika.dl.imagerec.DL4JVGG16Net modelType to VGG16 sample configuration is given below for refference. ``` <?xml version="1.0" encoding="UTF-8"?> <properties> <parsers> <parser class="org.apache.tika.parser.recognition.ObjectRecognitionParser"> <mime>image/jpeg</mime> <params> <param name="topN" type="int">2</param> <param name="minConfidence" type="double">0.015</param> <param name="class" type="string">org.apache.tika.dl.imagerec.DL4JVGG16Net</param> <param name="modelType" type="string">VGG16</param> <param name="serialize" type="string">yes</param> </params> </parser> </parsers> </properties> ``` Save the configuration at your preffered location. A default one is provided at ``` tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml ``` To run it in default configuration, build the project and move to root directory of the project and run the command. '``` java -Xmx3G -cp ./tika-dl/target/tika-dl-1.15-SNAPSHOT-jar-with-dependencies.jar;tika-app/target/tika-app-1.15-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-vgg16-config.xml tika-dl/src/test/resources/org/apache/tika/dl/imagerec/lion.jpg``` -Xmx3G is required because VGG16 model requires quite a lot of memory to run. Observations: When loading searilized model from disk: It only require around 1200mb of ram to run. When model is loaded from h5 files using helper functions It requires 2500mb of ram to run the model (required only one time if serialization is set to yes) Once the model runs, it automatically downloads the model file using helper functions of DL4J locally at .dl4j/trainedModels To speed up the process in future, once the model is loaded from original hash files, it is serialized and saved on disk at .dl4j/trainedModels/tikaPreprocessed which significantly reduces the resource usage (specially memory consumption) for future loads. Issue Link: https://issues.apache.org/jira/browse/TIKA-2298 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
          URL: https://github.com/apache/tika/pull/159#issuecomment-302990927

          ping @asmehra95 any update?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j URL: https://github.com/apache/tika/pull/159#issuecomment-302990927 ping @asmehra95 any update? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
          URL: https://github.com/apache/tika/pull/159#issuecomment-301683734

          yes sure! i am on it! @chrismattmann
          i will raise the PR as soon as possible

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j URL: https://github.com/apache/tika/pull/159#issuecomment-301683734 yes sure! i am on it! @chrismattmann i will raise the PR as soon as possible ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
          URL: https://github.com/apache/tika/pull/159#issuecomment-300577734

          guys #165 is now committed, so can this be updated to be inside Tika-DL? @asmehra95 @thammegowda

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j URL: https://github.com/apache/tika/pull/159#issuecomment-300577734 guys #165 is now committed, so can this be updated to be inside Tika-DL? @asmehra95 @thammegowda ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
          URL: https://github.com/apache/tika/pull/159#issuecomment-293461556

          @thammegowda Thank you for your comment.
          I will open a pull request once the tika-dl gets merged.

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j URL: https://github.com/apache/tika/pull/159#issuecomment-293461556 @thammegowda Thank you for your comment. I will open a pull request once the tika-dl gets merged. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          thammegowda commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
          URL: https://github.com/apache/tika/pull/159#issuecomment-293358840

          @asmehra95 appreciate your effort. Thanks for updating the code based on our review.

          1. I feel this PR should be raised to `tika-dl` module that is being proposed in #165 so that we can isolate DL4J dependencies to that module instead of `tika-parsers`. we have to wait till #165 PR gets merged and then move your classes inside tika-dl module.
          2. I am not sure whats happening with online/offline issue. It seems to me that one or other necessary file is missing (either the Keras JSON model, or the weights or the labels) so it tries to download from S3. I will have a closer look again and report my findings.

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - thammegowda commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j URL: https://github.com/apache/tika/pull/159#issuecomment-293358840 @asmehra95 appreciate your effort. Thanks for updating the code based on our review. 1. I feel this PR should be raised to `tika-dl` module that is being proposed in #165 so that we can isolate DL4J dependencies to that module instead of `tika-parsers`. we have to wait till #165 PR gets merged and then move your classes inside tika-dl module. 2. I am not sure whats happening with online/offline issue. It seems to me that one or other necessary file is missing (either the Keras JSON model, or the weights or the labels) so it tries to download from S3. I will have a closer look again and report my findings. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
          URL: https://github.com/apache/tika/pull/159#issuecomment-293141458

          hello folks,
          I have fixed formatting issues @thammegowda please review it. Let me know if any changes are required.
          I have made it a little more customizable. You can now choose if you want to save model to disk or not.
          Saving a model to disk requires a lot of memory( around 500mb ) but it saves a lot of runtime memory once the model is saved.

          How to use:
          add a field in config file
          ```xml
          <param name="serialize" type="string">no</param>
          ```
          It can be yes or no

          Observations:
          When loading model from disk:
          It only require around 1200mb of ram to run.

          When model is loaded from h5 files using helper functions
          It requires 2500mb of ram to run the model.

          I think we can distribute serialized models for vgg16 instead of the original hash files. Will it produce any problems @saudet @agibsonccc , One more thing, the VGG16 model doesn't work completely offline. It connects to internet after processing the image to decode output. Can we make it entirely offline?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j URL: https://github.com/apache/tika/pull/159#issuecomment-293141458 hello folks, I have fixed formatting issues @thammegowda please review it. Let me know if any changes are required. I have made it a little more customizable. You can now choose if you want to save model to disk or not. Saving a model to disk requires a lot of memory( around 500mb ) but it saves a lot of runtime memory once the model is saved. How to use: add a field in config file ```xml <param name="serialize" type="string">no</param> ``` It can be yes or no Observations: When loading model from disk: It only require around 1200mb of ram to run. When model is loaded from h5 files using helper functions It requires 2500mb of ram to run the model. I think we can distribute serialized models for vgg16 instead of the original hash files. Will it produce any problems @saudet @agibsonccc , One more thing, the VGG16 model doesn't work completely offline. It connects to internet after processing the image to decode output. Can we make it entirely offline? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
          URL: https://github.com/apache/tika/pull/159#issuecomment-293166407

          @agibsonccc What i am saying is, instead of downloading image weights(h5 file) i could write functions that download the serialized model from my repo because both are approximately same in size. The tika user would directly load from this serialized model not the image weights.

          What i doubt is that if the serialized model would work for all the platforms or not. Is there any platform dependency on it.
          The model will be serialized using
          ModelSerializer.writeModel(model, locationToSave, true);
          and loaded using
          model = ModelSerializer.restoreComputationGraph(locationToSave);

          Regarding the offline feature:

          When i try to decode predictions for an image offline it produces an error. Apparently it connects online for decoding.
          here is the stacktrace when offline
          https://gist.github.com/asmehra95/ac8bcfffbc5c1932d38a034d9b486c99

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j URL: https://github.com/apache/tika/pull/159#issuecomment-293166407 @agibsonccc What i am saying is, instead of downloading image weights(h5 file) i could write functions that download the serialized model from my repo because both are approximately same in size. The tika user would directly load from this serialized model not the image weights. What i doubt is that if the serialized model would work for all the platforms or not. Is there any platform dependency on it. The model will be serialized using ModelSerializer.writeModel(model, locationToSave, true); and loaded using model = ModelSerializer.restoreComputationGraph(locationToSave); Regarding the offline feature: When i try to decode predictions for an image offline it produces an error. Apparently it connects online for decoding. here is the stacktrace when offline https://gist.github.com/asmehra95/ac8bcfffbc5c1932d38a034d9b486c99 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          agibsonccc commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
          URL: https://github.com/apache/tika/pull/159#issuecomment-293144691

          Not sure what you mean here..it needs to download the image weights once not all the time. You can try bundling the weights with the model if you want, either that or you can take the pretrained model and save that with dl4j then just bundle that with the jar.

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - agibsonccc commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j URL: https://github.com/apache/tika/pull/159#issuecomment-293144691 Not sure what you mean here..it needs to download the image weights once not all the time. You can try bundling the weights with the model if you want, either that or you can take the pretrained model and save that with dl4j then just bundle that with the jar. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          saudet commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
          URL: https://github.com/apache/tika/pull/159#issuecomment-293142844

          /cc @turambar would know more

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - saudet commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j URL: https://github.com/apache/tika/pull/159#issuecomment-293142844 /cc @turambar would know more ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j
          URL: https://github.com/apache/tika/pull/159#issuecomment-293141458

          hello folks,
          I have fixed formatting issues @thammegowda please review it. Let me know if any changes are required.
          I have made it a little more customizable. You can now choose if you want to save model to disk or not.
          Saving a model to disk requires a lot of memory( around 500mb ) but it saves a lot of runtime memory once the model is saved.

          How to use:
          add a field in config file
          <param name="serialize" type="string">no</param>
          It can be yes or no

          Observations:
          When loading model from disk:
          It only require around 1200mb of ram to run.

          When model is loaded from h5 files using helper functions
          It requires 2500mb of ram to run the model.

          I think we can distribute serialized models for vgg16 instead of the original hash files. Will it produce any problems @saudet @agibsonccc , One more thing, the VGG16 model doesn't work completely offline. It connects to internet after processing the image to decode output. Can we make it entirely offline?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - asmehra95 commented on issue #159: Creation of TIKA-2298 contributed by asmehra95- Import of vgg16 via Deeplearning4j URL: https://github.com/apache/tika/pull/159#issuecomment-293141458 hello folks, I have fixed formatting issues @thammegowda please review it. Let me know if any changes are required. I have made it a little more customizable. You can now choose if you want to save model to disk or not. Saving a model to disk requires a lot of memory( around 500mb ) but it saves a lot of runtime memory once the model is saved. How to use: add a field in config file <param name="serialize" type="string">no</param> It can be yes or no Observations: When loading model from disk: It only require around 1200mb of ram to run. When model is loaded from h5 files using helper functions It requires 2500mb of ram to run the model. I think we can distribute serialized models for vgg16 instead of the original hash files. Will it produce any problems @saudet @agibsonccc , One more thing, the VGG16 model doesn't work completely offline. It connects to internet after processing the image to decode output. Can we make it entirely offline? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user asmehra95 opened a pull request:

          https://github.com/apache/tika/pull/159

          fix for TIKA-2298 contributed by asmehra95

          I have imported VGG16 model into Apache tika using deeplearning4j.
          The usage of this recogniser is very similar to TensorFlowRESTrecogniser but it doesn't require any external setup, like running RESTservice in as in case of TensorFlowRESTrecogniser.
          You can read more about TensorFlowRESTrecogniser at https://wiki.apache.org/tika/TikaAndVision

          To use the DL4JImageRecogniser set
          class param to org.apache.tika.parser.recognition.dl4j.DL4JImageRecogniser
          modelType to VGG16
          sample configuration is given below for refference.
          <?xml version="1.0" encoding="UTF-8"?>
          <properties>
          <parsers>
          <parser class="org.apache.tika.parser.recognition.ObjectRecognitionParser">
          <mime>image/jpeg</mime>
          <params>
          <param name="topN" type="int">5</param>
          <param name="minConfidence" type="double">0.015</param>
          <param name="class" type="string">org.apache.tika.parser.recognition.dl4j.DL4JImageRecogniser</param>
          <param name="modelType" type="string">VGG16</param>
          </params>
          </parser>
          </parsers>
          </properties>
          Save the configuration at : tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest

          To run it, build the project and move to root directory of the project and run the command

          java -Xmx3G -jar tika-app/target/tika-app-1.14.jar --config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml <path to your image file>

          -Xmx3G is required because VGG16 model requires quite a lot of memory to run. If your system is not able to run it, you may try to pump up the memory further

          Once the model runs, it automatically downloads the model file using helper functions of DL4J locally at .dl4j/trainedModels
          To speed up the process in future, once the model is loaded from original hash files, it is serialized and saved on disk at .dl4j/trainedModels/tikaPreprocessed which significantly reduces
          the resource usage (specially memory consumption) for future loads.
          For more details you can red this gist: https://gist.github.com/asmehra95/a16c49ec91f7f0d7b39c5bf6c2483e4d
          Issue Link:
          https://issues.apache.org/jira/browse/TIKA-2298

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/asmehra95/tika master

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tika/pull/159.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #159


          commit a5cd6f42dcded603f2b6de9476280c4bd95b6806
          Author: asmehra95 <asmehra95@gmail.com>
          Date: 2017-03-24T14:21:40Z

          Added dependencies for DL4JImageRecogniser parser

          commit f777f21b47c8d122e6b7a0819b44977f1d571c59
          Author: asmehra95 <asmehra95@gmail.com>
          Date: 2017-03-24T14:28:54Z

          Imported VGG16 model via deeplearning4j


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user asmehra95 opened a pull request: https://github.com/apache/tika/pull/159 fix for TIKA-2298 contributed by asmehra95 I have imported VGG16 model into Apache tika using deeplearning4j. The usage of this recogniser is very similar to TensorFlowRESTrecogniser but it doesn't require any external setup, like running RESTservice in as in case of TensorFlowRESTrecogniser. You can read more about TensorFlowRESTrecogniser at https://wiki.apache.org/tika/TikaAndVision To use the DL4JImageRecogniser set class param to org.apache.tika.parser.recognition.dl4j.DL4JImageRecogniser modelType to VGG16 sample configuration is given below for refference. <?xml version="1.0" encoding="UTF-8"?> <properties> <parsers> <parser class="org.apache.tika.parser.recognition.ObjectRecognitionParser"> <mime>image/jpeg</mime> <params> <param name="topN" type="int">5</param> <param name="minConfidence" type="double">0.015</param> <param name="class" type="string">org.apache.tika.parser.recognition.dl4j.DL4JImageRecogniser</param> <param name="modelType" type="string">VGG16</param> </params> </parser> </parsers> </properties> Save the configuration at : tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest To run it, build the project and move to root directory of the project and run the command java -Xmx3G -jar tika-app/target/tika-app-1.14.jar --config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml <path to your image file> -Xmx3G is required because VGG16 model requires quite a lot of memory to run. If your system is not able to run it, you may try to pump up the memory further Once the model runs, it automatically downloads the model file using helper functions of DL4J locally at .dl4j/trainedModels To speed up the process in future, once the model is loaded from original hash files, it is serialized and saved on disk at .dl4j/trainedModels/tikaPreprocessed which significantly reduces the resource usage (specially memory consumption) for future loads. For more details you can red this gist: https://gist.github.com/asmehra95/a16c49ec91f7f0d7b39c5bf6c2483e4d Issue Link: https://issues.apache.org/jira/browse/TIKA-2298 You can merge this pull request into a Git repository by running: $ git pull https://github.com/asmehra95/tika master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tika/pull/159.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #159 commit a5cd6f42dcded603f2b6de9476280c4bd95b6806 Author: asmehra95 <asmehra95@gmail.com> Date: 2017-03-24T14:21:40Z Added dependencies for DL4JImageRecogniser parser commit f777f21b47c8d122e6b7a0819b44977f1d571c59 Author: asmehra95 <asmehra95@gmail.com> Date: 2017-03-24T14:28:54Z Imported VGG16 model via deeplearning4j
          Hide
          thammegowda Thamme Gowda added a comment -

          Avtar Singh
          Please share a link to your code, I will have a look on this!

          Could you also refer to my example code at https://github.com/USCDataScience/dl4j-kerasimport-examples/tree/master/dl4j-import-example and see what flags to pass to the importer (especially flags to disable further training)?

          PR to that repo with your VGG16 example would be greatly appreciated!

          Show
          thammegowda Thamme Gowda added a comment - Avtar Singh Please share a link to your code, I will have a look on this! Could you also refer to my example code at https://github.com/USCDataScience/dl4j-kerasimport-examples/tree/master/dl4j-import-example and see what flags to pass to the importer (especially flags to disable further training)? PR to that repo with your VGG16 example would be greatly appreciated!
          Hide
          asmehra95 Avtar Singh added a comment -

          Not able run the VGG16 model in dl4j
          When I try to run full fledged model i get this error.
          Exception in thread "main" java.lang.OutOfMemoryError: Cannot allocate new FloatPointer(138357544): totalBytes = 1G, physicalBytes = 2G
          at org.bytedeco.javacpp.FloatPointer.<init>(FloatPointer.java:76)
          at org.nd4j.linalg.api.buffer.BaseDataBuffer.<init>(BaseDataBuffer.java:445)
          at org.nd4j.linalg.api.buffer.FloatBuffer.<init>(FloatBuffer.java:57)
          at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createFloat(DefaultDataBufferFactory.java:236)
          at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1301)
          at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1275)
          at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:252)
          at org.nd4j.linalg.cpu.nativecpu.NDArray.<init>(NDArray.java:109)
          at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:247)
          at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4768)
          at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4726)
          at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3861)
          at org.deeplearning4j.nn.graph.ComputationGraph.init(ComputationGraph.java:342)
          at org.deeplearning4j.nn.graph.ComputationGraph.init(ComputationGraph.java:274)
          at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:483)
          at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:471)
          at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:178)
          at modelImport.ModelImportConfig.main(ModelImportConfig.java:18)
          Caused by: java.lang.OutOfMemoryError: Native allocator returned address == 0
          at org.bytedeco.javacpp.FloatPointer.<init>(FloatPointer.java:70)
          ... 17 more

          when i run the model that says 'NoTop' It is says: Invalid configuration
          I found out in the source code for helper functions, that the json file needs fixing.

          I am running on i5 6th gen with 4gb RAM.
          I tried 2 OS: Ubuntu and Window.
          Is there any way i can run it?

          Show
          asmehra95 Avtar Singh added a comment - Not able run the VGG16 model in dl4j When I try to run full fledged model i get this error. Exception in thread "main" java.lang.OutOfMemoryError: Cannot allocate new FloatPointer(138357544): totalBytes = 1G, physicalBytes = 2G at org.bytedeco.javacpp.FloatPointer.<init>(FloatPointer.java:76) at org.nd4j.linalg.api.buffer.BaseDataBuffer.<init>(BaseDataBuffer.java:445) at org.nd4j.linalg.api.buffer.FloatBuffer.<init>(FloatBuffer.java:57) at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createFloat(DefaultDataBufferFactory.java:236) at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1301) at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1275) at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:252) at org.nd4j.linalg.cpu.nativecpu.NDArray.<init>(NDArray.java:109) at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:247) at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4768) at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4726) at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3861) at org.deeplearning4j.nn.graph.ComputationGraph.init(ComputationGraph.java:342) at org.deeplearning4j.nn.graph.ComputationGraph.init(ComputationGraph.java:274) at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:483) at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:471) at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:178) at modelImport.ModelImportConfig.main(ModelImportConfig.java:18) Caused by: java.lang.OutOfMemoryError: Native allocator returned address == 0 at org.bytedeco.javacpp.FloatPointer.<init>(FloatPointer.java:70) ... 17 more when i run the model that says 'NoTop' It is says: Invalid configuration I found out in the source code for helper functions, that the json file needs fixing. I am running on i5 6th gen with 4gb RAM. I tried 2 OS: Ubuntu and Window. Is there any way i can run it?

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              asmehra95 Avtar Singh
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 672h
                672h
                Remaining:
                Remaining Estimate - 672h
                672h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development