Details

      Description

      Author age can be firs feature and more can be added later


      Integrating work done on age classification. More details about classifier in below repo -
      https://github.com/USCDataScience/Age-Predictor

      Git repo have a java client which can be integrated in Tika

        Issue Links

          Activity

          Hide
          chrismattmann Chris A. Mattmann added a comment -

          sounds great Madhav Sharan any progress?

          Show
          chrismattmann Chris A. Mattmann added a comment - sounds great Madhav Sharan any progress?
          Hide
          msharan@usc.edu Madhav Sharan added a comment -

          I did raise a PR in https://github.com/apache/tika/pull/186

          Don't know why it was not tracked here.

          Once you review it, I'll push AgePredicter jar to maven central

          Show
          msharan@usc.edu Madhav Sharan added a comment - I did raise a PR in https://github.com/apache/tika/pull/186 Don't know why it was not tracked here. Once you review it, I'll push AgePredicter jar to maven central
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
          URL: https://github.com/apache/tika/pull/186#issuecomment-310274450

          thanks @smadha missed this will review now!

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-310274450 thanks @smadha missed this will review now! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          chrismattmann Chris A. Mattmann added a comment -

          sorry I missed it! will look now

          Show
          chrismattmann Chris A. Mattmann added a comment - sorry I missed it! will look now
          Hide
          githubbot ASF GitHub Bot added a comment -

          smadha commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
          URL: https://github.com/apache/tika/pull/186#issuecomment-312561843

          @chrismattmann - Any comments?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - smadha commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-312561843 @chrismattmann - Any comments? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
          URL: https://github.com/apache/tika/pull/186#issuecomment-313587299

          Finally got this working!

          ```
          LMC-053601:tika-parsers mattmann$ java -cp ../tika-app/target/tika-app-1.16-SNAPSHOT.jar:./model org.apache.tika.cli.TikaCLI --config=src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml -m test.txt
          Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The ImageParser will skip jbig2 images
          Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
          See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
          for optional dependencies.
          TIFFImageWriter not loaded. tiff files will not be processed
          See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
          for optional dependencies.
          J2KImageReader not loaded. JPEG2000 files will not be processed.
          See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
          for optional dependencies.

          Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: Tesseract OCR is installed and will be automatically applied to image files.
          This may dramatically slow down content extraction (TIKA-2359).
          As of Tika 1.15 (and prior versions), Tesseract is automatically called.
          In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig.
          Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: org.xerial's sqlite-jdbc is not loaded.
          Please provide the jar on your classpath to parse sqlite files.
          See tika-parsers/pom.xml for the correct version.
          Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The ImageParser will skip jbig2 images
          Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
          See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
          for optional dependencies.
          TIFFImageWriter not loaded. tiff files will not be processed
          See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
          for optional dependencies.
          J2KImageReader not loaded. JPEG2000 files will not be processed.
          See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
          for optional dependencies.

          Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: Tesseract OCR is installed and will be automatically applied to image files.
          This may dramatically slow down content extraction (TIKA-2359).
          As of Tika 1.15 (and prior versions), Tesseract is automatically called.
          In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig.
          Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: org.xerial's sqlite-jdbc is not loaded.
          Please provide the jar on your classpath to parse sqlite files.
          See tika-parsers/pom.xml for the correct version.
          INFO Running Spark version 2.0.0
          WARN Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
          INFO Changing view acls to: mattmann
          INFO Changing modify acls to: mattmann
          INFO Changing view acls groups to:
          INFO Changing modify acls groups to:
          INFO SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mattmann); groups with view permissions: Set(); users with modify permissions: Set(mattmann); groups with modify permissions: Set()
          INFO Successfully started service 'sparkDriver' on port 51510.
          INFO Registering MapOutputTracker
          INFO Registering BlockManagerMaster
          INFO Created local directory at /private/var/folders/n5/1d_k3z4s2293q8ntx_n8sw54mm5n_8/T/blockmgr-bd30e8b2-1f38-49f9-b170-c3a95a7e312b
          INFO MemoryStore started with capacity 2004.6 MB
          INFO Registering OutputCommitCoordinator
          INFO Logging initialized @1597ms
          INFO jetty-9.2.z-SNAPSHOT
          INFO Started o.s.j.s.ServletContextHandler@f73dcd6

          {/jobs,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@5c87bfe2

          {/jobs/json,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@2fea7088

          {/jobs/job,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@40499e4f

          {/jobs/job/json,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@51cd7ffc

          {/stages,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@30d4b288

          {/stages/json,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@4cc6fa2a

          {/stages/stage,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@40f1be1b

          {/stages/stage/json,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@7a791b66

          {/stages/pool,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@6f2cb653

          {/stages/pool/json,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@14c01636

          {/storage,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@590c73d3

          {/storage/json,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@6b9ce1bf

          {/storage/rdd,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@61884cb1

          {/storage/rdd/json,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@75ed9710

          {/environment,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@4fc5e095

          {/environment/json,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@435871cb

          {/executors,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@609640d5

          {/executors/json,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@79da1ec0

          {/executors/threadDump,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@19fb8826

          {/executors/threadDump/json,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@192d74fb

          {/static,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@4bef0fe3

          {/,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@62ea3440

          {/api,null,AVAILABLE}

          INFO Started o.s.j.s.ServletContextHandler@27953a83

          {/stages/stage/kill,null,AVAILABLE}

          INFO Started ServerConnector@25748410

          {HTTP/1.1}{0.0.0.0:4040}
          INFO Started @1705ms
          INFO Successfully started service 'SparkUI' on port 4040.
          INFO Bound SparkUI to 0.0.0.0, and started at http://192.168.1.65:4040
          INFO Starting executor ID driver on host localhost
          INFO Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51511.
          INFO Server created on 192.168.1.65:51511
          INFO Registering BlockManager BlockManagerId(driver, 192.168.1.65, 51511)
          INFO Registering block manager 192.168.1.65:51511 with 2004.6 MB RAM, BlockManagerId(driver, 192.168.1.65, 51511)
          INFO Registered BlockManager BlockManagerId(driver, 192.168.1.65, 51511)
          INFO Started o.s.j.s.ServletContextHandler@5305c37d{/metrics/json,null,AVAILABLE}
          WARN Use an existing SparkContext, some configuration may not take effect.
          INFO Started o.s.j.s.ServletContextHandler@3c1e3314{/SQL,null,AVAILABLE}
          INFO Started o.s.j.s.ServletContextHandler@78e16155{/SQL/json,null,AVAILABLE}
          INFO Started o.s.j.s.ServletContextHandler@50b0bc4c{/SQL/execution,null,AVAILABLE}
          INFO Started o.s.j.s.ServletContextHandler@13c612bd{/SQL/execution/json,null,AVAILABLE}
          INFO Started o.s.j.s.ServletContextHandler@28fa700e{/static/sql,null,AVAILABLE}
          INFO Warehouse path is 'file:/Users/mattmann/tmp/tika1.15/tika-parsers/spark-warehouse'.
          INFO Block broadcast_0 stored as values in memory (estimated size 6.1 MB, free 1998.5 MB)
          INFO Block broadcast_0_piece0 stored as bytes in memory (estimated size 488.5 KB, free 1998.0 MB)
          INFO Added broadcast_0_piece0 in memory on 192.168.1.65:51511 (size: 488.5 KB, free: 2004.1 MB)
          INFO Created broadcast 0 from broadcast at CountVectorizer.scala:243
          INFO Code generated in 1407.24616 ms
          INFO Starting job: first at AgePredicterLocal.java:114
          INFO Got job 0 (first at AgePredicterLocal.java:114) with 1 output partitions
          INFO Final stage: ResultStage 0 (first at AgePredicterLocal.java:114)
          INFO Parents of final stage: List()
          INFO Missing parents: List()
          INFO Submitting ResultStage 0 (MapPartitionsRDD[3] at javaRDD at AgePredicterLocal.java:112), which has no missing parents
          INFO Block broadcast_1 stored as values in memory (estimated size 10.5 KB, free 1998.0 MB)
          INFO Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.3 KB, free 1998.0 MB)
          INFO Added broadcast_1_piece0 in memory on 192.168.1.65:51511 (size: 5.3 KB, free: 2004.1 MB)
          INFO Created broadcast 1 from broadcast at DAGScheduler.scala:1012
          INFO Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at javaRDD at AgePredicterLocal.java:112)
          INFO Adding task set 0.0 with 1 tasks
          INFO Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, PROCESS_LOCAL, 6477 bytes)
          INFO Running task 0.0 in stage 0.0 (TID 0)
          INFO Code generated in 16.846256 ms
          INFO Finished task 0.0 in stage 0.0 (TID 0). 3228 bytes result sent to driver
          INFO Finished task 0.0 in stage 0.0 (TID 0) in 90 ms on localhost (1/1)
          INFO Removed TaskSet 0.0, whose tasks have all completed, from pool
          INFO ResultStage 0 (first at AgePredicterLocal.java:114) finished in 0.103 s
          INFO Job 0 finished: first at AgePredicterLocal.java:114, took 0.161496 s
          Content-Length: 17
          Content-Type: text/plain
          Estimated-Author-Age: 32.29913797083779
          X-Parsed-By: org.apache.tika.parser.CompositeParser
          X-Parsed-By: org.apache.tika.parser.recognition.AgeRecogniser
          resourceName: test.txt
          INFO Invoking stop() from shutdown hook
          INFO Stopped ServerConnector@25748410{HTTP/1.1} {0.0.0.0:4040}

          INFO Stopped o.s.j.s.ServletContextHandler@27953a83

          {/stages/stage/kill,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@62ea3440

          {/api,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@4bef0fe3

          {/,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@192d74fb

          {/static,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@19fb8826

          {/executors/threadDump/json,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@79da1ec0

          {/executors/threadDump,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@609640d5

          {/executors/json,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@435871cb

          {/executors,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@4fc5e095

          {/environment/json,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@75ed9710

          {/environment,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@61884cb1

          {/storage/rdd/json,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@6b9ce1bf

          {/storage/rdd,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@590c73d3

          {/storage/json,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@14c01636

          {/storage,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@6f2cb653

          {/stages/pool/json,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@7a791b66

          {/stages/pool,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@40f1be1b

          {/stages/stage/json,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@4cc6fa2a

          {/stages/stage,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@30d4b288

          {/stages/json,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@51cd7ffc

          {/stages,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@40499e4f

          {/jobs/job/json,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@2fea7088

          {/jobs/job,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@5c87bfe2

          {/jobs/json,null,UNAVAILABLE}

          INFO Stopped o.s.j.s.ServletContextHandler@f73dcd6

          {/jobs,null,UNAVAILABLE}

          INFO Stopped Spark web UI at http://192.168.1.65:4040
          INFO MapOutputTrackerMasterEndpoint stopped!
          INFO MemoryStore cleared
          INFO BlockManager stopped
          INFO BlockManagerMaster stopped
          INFO OutputCommitCoordinator stopped!
          INFO Successfully stopped SparkContext
          INFO Shutdown hook called
          INFO Deleting directory /private/var/folders/n5/1d_k3z4s2293q8ntx_n8sw54mm5n_8/T/spark-fa52d6bc-863e-4ee1-98da-8352c0c5c84e
          LMC-053601:tika-parsers mattmann$
          ```

          Will commit now!

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-313587299 Finally got this working! ``` LMC-053601:tika-parsers mattmann$ java -cp ../tika-app/target/tika-app-1.16-SNAPSHOT.jar:./model org.apache.tika.cli.TikaCLI --config=src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml -m test.txt Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The ImageParser will skip jbig2 images Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. TIFFImageWriter not loaded. tiff files will not be processed See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: Tesseract OCR is installed and will be automatically applied to image files. This may dramatically slow down content extraction ( TIKA-2359 ). As of Tika 1.15 (and prior versions), Tesseract is automatically called. In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig. Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The ImageParser will skip jbig2 images Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. TIFFImageWriter not loaded. tiff files will not be processed See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: Tesseract OCR is installed and will be automatically applied to image files. This may dramatically slow down content extraction ( TIKA-2359 ). As of Tika 1.15 (and prior versions), Tesseract is automatically called. In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig. Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Running Spark version 2.0.0 WARN Unable to load native-hadoop library for your platform... using builtin-java classes where applicable INFO Changing view acls to: mattmann INFO Changing modify acls to: mattmann INFO Changing view acls groups to: INFO Changing modify acls groups to: INFO SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mattmann); groups with view permissions: Set(); users with modify permissions: Set(mattmann); groups with modify permissions: Set() INFO Successfully started service 'sparkDriver' on port 51510. INFO Registering MapOutputTracker INFO Registering BlockManagerMaster INFO Created local directory at /private/var/folders/n5/1d_k3z4s2293q8ntx_n8sw54mm5n_8/T/blockmgr-bd30e8b2-1f38-49f9-b170-c3a95a7e312b INFO MemoryStore started with capacity 2004.6 MB INFO Registering OutputCommitCoordinator INFO Logging initialized @1597ms INFO jetty-9.2.z-SNAPSHOT INFO Started o.s.j.s.ServletContextHandler@f73dcd6 {/jobs,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@5c87bfe2 {/jobs/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@2fea7088 {/jobs/job,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@40499e4f {/jobs/job/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@51cd7ffc {/stages,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@30d4b288 {/stages/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@4cc6fa2a {/stages/stage,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@40f1be1b {/stages/stage/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@7a791b66 {/stages/pool,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@6f2cb653 {/stages/pool/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@14c01636 {/storage,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@590c73d3 {/storage/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@6b9ce1bf {/storage/rdd,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@61884cb1 {/storage/rdd/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@75ed9710 {/environment,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@4fc5e095 {/environment/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@435871cb {/executors,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@609640d5 {/executors/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@79da1ec0 {/executors/threadDump,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@19fb8826 {/executors/threadDump/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@192d74fb {/static,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@4bef0fe3 {/,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@62ea3440 {/api,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@27953a83 {/stages/stage/kill,null,AVAILABLE} INFO Started ServerConnector@25748410 {HTTP/1.1}{0.0.0.0:4040} INFO Started @1705ms INFO Successfully started service 'SparkUI' on port 4040. INFO Bound SparkUI to 0.0.0.0, and started at http://192.168.1.65:4040 INFO Starting executor ID driver on host localhost INFO Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51511. INFO Server created on 192.168.1.65:51511 INFO Registering BlockManager BlockManagerId(driver, 192.168.1.65, 51511) INFO Registering block manager 192.168.1.65:51511 with 2004.6 MB RAM, BlockManagerId(driver, 192.168.1.65, 51511) INFO Registered BlockManager BlockManagerId(driver, 192.168.1.65, 51511) INFO Started o.s.j.s.ServletContextHandler@5305c37d{/metrics/json,null,AVAILABLE} WARN Use an existing SparkContext, some configuration may not take effect. INFO Started o.s.j.s.ServletContextHandler@3c1e3314{/SQL,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@78e16155{/SQL/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@50b0bc4c{/SQL/execution,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@13c612bd{/SQL/execution/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@28fa700e{/static/sql,null,AVAILABLE} INFO Warehouse path is 'file:/Users/mattmann/tmp/tika1.15/tika-parsers/spark-warehouse'. INFO Block broadcast_0 stored as values in memory (estimated size 6.1 MB, free 1998.5 MB) INFO Block broadcast_0_piece0 stored as bytes in memory (estimated size 488.5 KB, free 1998.0 MB) INFO Added broadcast_0_piece0 in memory on 192.168.1.65:51511 (size: 488.5 KB, free: 2004.1 MB) INFO Created broadcast 0 from broadcast at CountVectorizer.scala:243 INFO Code generated in 1407.24616 ms INFO Starting job: first at AgePredicterLocal.java:114 INFO Got job 0 (first at AgePredicterLocal.java:114) with 1 output partitions INFO Final stage: ResultStage 0 (first at AgePredicterLocal.java:114) INFO Parents of final stage: List() INFO Missing parents: List() INFO Submitting ResultStage 0 (MapPartitionsRDD [3] at javaRDD at AgePredicterLocal.java:112), which has no missing parents INFO Block broadcast_1 stored as values in memory (estimated size 10.5 KB, free 1998.0 MB) INFO Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.3 KB, free 1998.0 MB) INFO Added broadcast_1_piece0 in memory on 192.168.1.65:51511 (size: 5.3 KB, free: 2004.1 MB) INFO Created broadcast 1 from broadcast at DAGScheduler.scala:1012 INFO Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD [3] at javaRDD at AgePredicterLocal.java:112) INFO Adding task set 0.0 with 1 tasks INFO Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, PROCESS_LOCAL, 6477 bytes) INFO Running task 0.0 in stage 0.0 (TID 0) INFO Code generated in 16.846256 ms INFO Finished task 0.0 in stage 0.0 (TID 0). 3228 bytes result sent to driver INFO Finished task 0.0 in stage 0.0 (TID 0) in 90 ms on localhost (1/1) INFO Removed TaskSet 0.0, whose tasks have all completed, from pool INFO ResultStage 0 (first at AgePredicterLocal.java:114) finished in 0.103 s INFO Job 0 finished: first at AgePredicterLocal.java:114, took 0.161496 s Content-Length: 17 Content-Type: text/plain Estimated-Author-Age: 32.29913797083779 X-Parsed-By: org.apache.tika.parser.CompositeParser X-Parsed-By: org.apache.tika.parser.recognition.AgeRecogniser resourceName: test.txt INFO Invoking stop() from shutdown hook INFO Stopped ServerConnector@25748410{HTTP/1.1} {0.0.0.0:4040} INFO Stopped o.s.j.s.ServletContextHandler@27953a83 {/stages/stage/kill,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@62ea3440 {/api,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@4bef0fe3 {/,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@192d74fb {/static,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@19fb8826 {/executors/threadDump/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@79da1ec0 {/executors/threadDump,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@609640d5 {/executors/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@435871cb {/executors,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@4fc5e095 {/environment/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@75ed9710 {/environment,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@61884cb1 {/storage/rdd/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@6b9ce1bf {/storage/rdd,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@590c73d3 {/storage/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@14c01636 {/storage,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@6f2cb653 {/stages/pool/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@7a791b66 {/stages/pool,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@40f1be1b {/stages/stage/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@4cc6fa2a {/stages/stage,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@30d4b288 {/stages/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@51cd7ffc {/stages,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@40499e4f {/jobs/job/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@2fea7088 {/jobs/job,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@5c87bfe2 {/jobs/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@f73dcd6 {/jobs,null,UNAVAILABLE} INFO Stopped Spark web UI at http://192.168.1.65:4040 INFO MapOutputTrackerMasterEndpoint stopped! INFO MemoryStore cleared INFO BlockManager stopped INFO BlockManagerMaster stopped INFO OutputCommitCoordinator stopped! INFO Successfully stopped SparkContext INFO Shutdown hook called INFO Deleting directory /private/var/folders/n5/1d_k3z4s2293q8ntx_n8sw54mm5n_8/T/spark-fa52d6bc-863e-4ee1-98da-8352c0c5c84e LMC-053601:tika-parsers mattmann$ ``` Will commit now! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann closed pull request #186: fix for TIKA-1988 contributed by msharan@usc.edu
          URL: https://github.com/apache/tika/pull/186

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann closed pull request #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          smadha commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
          URL: https://github.com/apache/tika/pull/186#issuecomment-313587646

          Thanks @chrismattmann

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - smadha commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-313587646 Thanks @chrismattmann ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          chrismattmann Chris A. Mattmann added a comment -
          Show
          chrismattmann Chris A. Mattmann added a comment - merged into master thanks Madhav Sharan , Thamme Gowda and Tim Allison
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Jenkins build Tika-trunk #1319 (See https://builds.apache.org/job/Tika-trunk/1319/)
          Fix Felix bundle rules for Age Prediction Parser OGSI bundle. TIKA-1988. (mattmann: https://github.com/apache/tika/commit/9be1785e948822c58138bc4b660ec4421ee26e5d)

          • (edit) tika-bundle/pom.xml
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Jenkins build Tika-trunk #1319 (See https://builds.apache.org/job/Tika-trunk/1319/ ) Fix Felix bundle rules for Age Prediction Parser OGSI bundle. TIKA-1988 . (mattmann: https://github.com/apache/tika/commit/9be1785e948822c58138bc4b660ec4421ee26e5d ) (edit) tika-bundle/pom.xml
          Hide
          tallison@mitre.org Tim Allison added a comment -

          1) Would it be possible to allow for failure to get/find models?

          Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (default) on project tika-parsers: An Ant BuildException has occured: Warning: Could not find file C:\blah\tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\en-pos-maxent.bin to copy.
          [ERROR] around Ant part ...<copy file="C:\blah\tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/en-pos-maxent.bin" todir="C:\blahtika-asf2-git-1.x\tika-parsers/model/opennlp/"/>... @ 4:238 in C:\blah\tika-asf2-git-1.x\tika-parsers\target\antrun\build-main.xml
          [ERROR] -> [Help 1]
          [ERROR]
          [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
          [ERROR] Re-run Maven using the -X switch to enable full debug logging.

          2) Can we put this in a separate module or try to pare down the dependencies?
          - edu.usc.ir:age-predictor-api:jar:1.0:compile
          [INFO] - edu.usc.ir:age-predictor-cli:jar:1.0:compile
          [INFO] +- edu.usc.ir:age-predictor-opennlp:jar:1.0:compile
          [INFO] | +- (org.apache.opennlp:opennlp-tools:jar:1.6.0:compile - omitted for duplicate)
          [INFO] | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.12; omitted for duplicate)
          [INFO] | - (commons-io:commons-io:jar:2.5:compile - omitted for duplicate)
          [INFO] +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.12; omitted for duplicate)
          [INFO] +- (commons-io:commons-io:jar:2.5:compile - omitted for duplicate)
          [INFO] - org.apache.spark:spark-mllib_2.10:jar:2.0.0:compile
          [INFO] +- org.apache.spark:spark-core_2.10:jar:2.0.0:compile
          [INFO] | +- org.apache.avro:avro-mapred:jar:hadoop2:1.7.7:compile
          [INFO] | | +- org.apache.avro:avro-ipc:jar:1.7.7:compile
          [INFO] | | | +- org.apache.avro:avro:jar:1.7.7:compile
          [INFO] | | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate)
          [INFO] | | | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile - omitted for duplicate)
          [INFO] | | | | +- (com.thoughtworks.paranamer:paranamer:jar:2.3:compile - omitted for conflict with 2.6)
          [INFO] | | | | +- (org.xerial.snappy:snappy-java:jar:1.0.5:compile - omitted for conflict with 1.1.2.4)
          [INFO] | | | | +- (org.apache.commons:commons-compress:jar:1.4.1:compile - omitted for conflict with 1.14)
          [INFO] | | | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate)
          [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate)
          [INFO] | | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile - omitted for duplicate)
          [INFO] | | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate)
          [INFO] | | +- org.apache.avro:avro-ipc:jar:tests:1.7.7:compile
          [INFO] | | | +- (org.apache.avro:avro:jar:1.7.7:compile - omitted for duplicate)
          [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate)
          [INFO] | | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile - omitted for duplicate)
          [INFO] | | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate)
          [INFO] | | +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
          [INFO] | | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
          [INFO] | | | - (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate)
          [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate)
          [INFO] | +- com.twitter:chill_2.10:jar:0.8.0:compile
          [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.5:compile - omitted for conflict with 2.10.6)
          [INFO] | | +- (com.twitter:chill-java:jar:0.8.0:compile - omitted for duplicate)
          [INFO] | | - com.esotericsoftware:kryo-shaded:jar:3.0.3:compile
          [INFO] | | +- com.esotericsoftware:minlog:jar:1.3.0:compile
          [INFO] | | - org.objenesis:objenesis:jar:2.1:compile
          [INFO] | +- com.twitter:chill-java:jar:0.8.0:compile
          [INFO] | | - (com.esotericsoftware:kryo-shaded:jar:3.0.3:compile - omitted for duplicate)
          [INFO] | +- org.apache.xbean:xbean-asm5-shaded:jar:4.4:compile
          [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
          [INFO] | | +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile
          [INFO] | | | +- (org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile - omitted for duplicate)
          [INFO] | | | +- (com.google.guava:guava:jar:11.0.2:compile - omitted for conflict with 17.0)
          [INFO] | | | +- commons-cli:commons-cli:jar:1.2:compile
          [INFO] | | | +- org.apache.commons:commons-math:jar:2.1:compile
          [INFO] | | | +- xmlenc:xmlenc:jar:0.52:compile
          [INFO] | | | +- (commons-httpclient:commons-httpclient:jar:3.1:compile - omitted for duplicate)
          [INFO] | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.10)
          [INFO] | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5)
          [INFO] | | | +- (commons-net:commons-net:jar:3.1:compile - omitted for conflict with 2.2)
          [INFO] | | | +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate)
          [INFO] | | | +- (commons-lang:commons-lang:jar:2.5:compile - omitted for conflict with 2.6)
          [INFO] | | | +- commons-configuration:commons-configuration:jar:1.6:compile
          [INFO] | | | | +- commons-collections:commons-collections:jar:3.2.1:compile
          [INFO] | | | | +- (commons-lang:commons-lang:jar:2.4:compile - omitted for conflict with 2.6)
          [INFO] | | | | +- commons-digester:commons-digester:jar:1.8:compile
          [INFO] | | | | | - commons-beanutils:commons-beanutils:jar:1.7.0:compile
          [INFO] | | | | - commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
          [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:runtime - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile - omitted for conflict with 1.9.13)
          [INFO] | | | +- (org.apache.avro:avro:jar:1.7.4:compile - omitted for conflict with 1.7.7)
          [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
          [INFO] | | | +- org.apache.hadoop:hadoop-auth:jar:2.2.0:compile
          [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.10)
          [INFO] | | | | +- (log4j:log4j:jar:1.2.17:runtime - omitted for duplicate)
          [INFO] | | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:runtime - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate)
          [INFO] | | | - (org.apache.commons:commons-compress:jar:1.4.1:compile - omitted for conflict with 1.14)
          [INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.2.0:compile
          [INFO] | | | +- (com.google.guava:guava:jar:11.0.2:compile - omitted for conflict with 17.0)
          [INFO] | | | +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
          [INFO] | | | +- (commons-cli:commons-cli:jar:1.2:compile - omitted for duplicate)
          [INFO] | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.10)
          [INFO] | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5)
          [INFO] | | | +- (commons-lang:commons-lang:jar:2.5:compile - omitted for conflict with 2.6)
          [INFO] | | | +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate)
          [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
          [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile - omitted for conflict with 1.9.13)
          [INFO] | | | - (xmlenc:xmlenc:jar:0.52:compile - omitted for duplicate)
          [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.2.0:compile
          [INFO] | | | +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.2.0:compile
          [INFO] | | | | +- (org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile - omitted for duplicate)
          [INFO] | | | | +- org.apache.hadoop:hadoop-yarn-client:jar:2.2.0:compile
          [INFO] | | | | | +- (org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile - omitted for duplicate)
          [INFO] | | | | | +- (org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile - omitted for duplicate)
          [INFO] | | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
          [INFO] | | | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5)
          [INFO] | | | | | +- (com.google.inject:guice:jar:3.0:compile - omitted for duplicate)
          [INFO] | | | | | +- (com.sun.jersey.jersey-test-framework:jersey-test-framework-grizzly2:jar:1.9:compile - omitted for duplicate)
          [INFO] | | | | | +- (com.sun.jersey:jersey-server:jar:1.9:compile - omitted for duplicate)
          [INFO] | | | | | +- (com.sun.jersey:jersey-json:jar:1.9:compile - omitted for duplicate)
          [INFO] | | | | | - (com.sun.jersey.contribs:jersey-guice:jar:1.9:compile - omitted for duplicate)
          [INFO] | | | | +- (org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile - omitted for duplicate)
          [INFO] | | | | +- org.apache.hadoop:hadoop-yarn-server-common:jar:2.2.0:compile
          [INFO] | | | | | +- (org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile - omitted for duplicate)
          [INFO] | | | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate)
          [INFO] | | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
          [INFO] | | | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5)
          [INFO] | | | | | +- (com.google.inject:guice:jar:3.0:compile - omitted for duplicate)
          [INFO] | | | | | +- (com.sun.jersey.jersey-test-framework:jersey-test-framework-grizzly2:jar:1.9:compile - omitted for duplicate)
          [INFO] | | | | | +- (com.sun.jersey:jersey-server:jar:1.9:compile - omitted for duplicate)
          [INFO] | | | | | +- (com.sun.jersey:jersey-json:jar:1.9:compile - omitted for duplicate)
          [INFO] | | | | | - (com.sun.jersey.contribs:jersey-guice:jar:1.9:compile - omitted for duplicate)
          [INFO] | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
          [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | +- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.2.0:compile
          [INFO] | | | | +- (org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile - omitted for duplicate)
          [INFO] | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
          [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
          [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | +- org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile
          [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
          [INFO] | | | - (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5)
          [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile
          [INFO] | | | +- org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile
          [INFO] | | | | +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate)
          [INFO] | | | | +- (org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile - omitted for duplicate)
          [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
          [INFO] | | | | - (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5)
          [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
          [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.2.0:compile
          [INFO] | | | +- (org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.2.0:compile - omitted for duplicate)
          [INFO] | | | +- (org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.2.0:compile - omitted for duplicate)
          [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
          [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] | | - org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile
          [INFO] | +- org.apache.spark:spark-launcher_2.10:jar:2.0.0:compile
          [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
          [INFO] | +- org.apache.spark:spark-network-common_2.10:jar:2.0.0:compile
          [INFO] | | +- (io.netty:netty-all:jar:4.0.29.Final:compile - omitted for duplicate)
          [INFO] | | +- (com.google.code.findbugs:jsr305:jar:1.3.9:compile - omitted for duplicate)
          [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
          [INFO] | +- org.apache.spark:spark-network-shuffle_2.10:jar:2.0.0:compile
          [INFO] | | +- (org.apache.spark:spark-network-common_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | | +- org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:compile
          [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate)
          [INFO] | | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.6.5:compile
          [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
          [INFO] | +- org.apache.spark:spark-unsafe_2.10:jar:2.0.0:compile
          [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | | +- (com.twitter:chill_2.10:jar:0.8.0:compile - omitted for duplicate)
          [INFO] | | +- (com.google.code.findbugs:jsr305:jar:1.3.9:compile - omitted for duplicate)
          [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
          [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.7.1:compile
          [INFO] | | +- (commons-codec:commons-codec:jar:1.3:compile - omitted for conflict with 1.10)
          [INFO] | | - commons-httpclient:commons-httpclient:jar:3.1:compile
          [INFO] | | - (commons-codec:commons-codec:jar:1.2:compile - omitted for conflict with 1.10)
          [INFO] | +- org.apache.curator:curator-recipes:jar:2.4.0:compile
          [INFO] | | +- org.apache.curator:curator-framework:jar:2.4.0:compile
          [INFO] | | | +- org.apache.curator:curator-client:jar:2.4.0:compile
          [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate)
          [INFO] | | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate)
          [INFO] | | | | - (com.google.guava:guava:jar:14.0.1:compile - omitted for conflict with 17.0)
          [INFO] | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate)
          [INFO] | | | - (com.google.guava:guava:jar:14.0.1:compile - omitted for conflict with 17.0)
          [INFO] | | +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile
          [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.1; omitted for duplicate)
          [INFO] | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.6.1; omitted for duplicate)
          [INFO] | | | - (log4j:log4j:jar:1.2.15:compile - omitted for conflict with 1.2.17)
          [INFO] | | - (com.google.guava:guava:jar:14.0.1:compile - omitted for conflict with 17.0)
          [INFO] | +- javax.servlet:javax.servlet-api:jar:3.1.0:compile
          [INFO] | +- org.apache.commons:commons-lang3:jar:3.3.2:compile
          [INFO] | +- (org.apache.commons:commons-math3:jar:3.4.1:compile - omitted for duplicate)
          [INFO] | +- com.google.code.findbugs:jsr305:jar:1.3.9:compile
          [INFO] | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate)
          [INFO] | +- (org.slf4j:jul-to-slf4j:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate)
          [INFO] | +- (org.slf4j:jcl-over-slf4j:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate)
          [INFO] | +- log4j:log4j:jar:1.2.17:compile
          [INFO] | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate)
          [INFO] | +- com.ning:compress-lzf:jar:1.0.3:compile
          [INFO] | +- org.xerial.snappy:snappy-java:jar:1.1.2.4:compile
          [INFO] | +- net.jpountz.lz4:lz4:jar:1.3.0:compile
          [INFO] | +- org.roaringbitmap:RoaringBitmap:jar:0.5.11:compile
          [INFO] | +- commons-net:commons-net:jar:2.2:compile
          [INFO] | +- org.scala-lang:scala-library:jar:2.10.6:compile
          [INFO] | +- org.json4s:json4s-jackson_2.10:jar:3.2.11:compile
          [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6)
          [INFO] | | +- org.json4s:json4s-core_2.10:jar:3.2.11:compile
          [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6)
          [INFO] | | | +- org.json4s:json4s-ast_2.10:jar:3.2.11:compile
          [INFO] | | | | - (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6)
          [INFO] | | | +- com.thoughtworks.paranamer:paranamer:jar:2.6:compile
          [INFO] | | | - org.scala-lang:scalap:jar:2.10.0:compile
          [INFO] | | | - org.scala-lang:scala-compiler:jar:2.10.0:compile
          [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6)
          [INFO] | | | - (org.scala-lang:scala-reflect:jar:2.10.0:compile - omitted for conflict with 2.10.6)
          [INFO] | | - (com.fasterxml.jackson.core:jackson-databind:jar:2.3.1:compile - omitted for conflict with 2.6.5)
          [INFO] | +- org.glassfish.jersey.core:jersey-client:jar:2.22.2:compile
          [INFO] | | +- (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate)
          [INFO] | | +- org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile
          [INFO] | | | +- org.glassfish.hk2:hk2-utils:jar:2.4.0-b34:compile
          [INFO] | | | - org.glassfish.hk2.external:aopalliance-repackaged:jar:2.4.0-b34:compile
          [INFO] | | +- org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile
          [INFO] | | - org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile
          [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.hk2.external:aopalliance-repackaged:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.hk2:hk2-utils:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | - org.javassist:javassist:jar:3.18.1-GA:compile
          [INFO] | +- org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile
          [INFO] | | +- (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate)
          [INFO] | | +- (javax.annotation:javax.annotation-api:jar:1.2:compile - omitted for duplicate)
          [INFO] | | +- org.glassfish.jersey.bundles.repackaged:jersey-guava:jar:2.22.2:compile
          [INFO] | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | - org.glassfish.hk2:osgi-resource-locator:jar:1.0.1:compile
          [INFO] | +- org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile
          [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.jersey.core:jersey-client:jar:2.22.2:compile - omitted for duplicate)
          [INFO] | | +- (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate)
          [INFO] | | +- org.glassfish.jersey.media:jersey-media-jaxb:jar:2.22.2:compile
          [INFO] | | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate)
          [INFO] | | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | | +- (org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | | - (org.glassfish.hk2:osgi-resource-locator:jar:1.0.1:compile - omitted for duplicate)
          [INFO] | | +- (javax.annotation:javax.annotation-api:jar:1.2:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | - javax.validation:validation-api:jar:1.1.0.Final:compile
          [INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet:jar:2.22.2:compile
          [INFO] | | +- (org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.22.2:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile - omitted for duplicate)
          [INFO] | | - (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate)
          [INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.22.2:compile
          [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate)
          [INFO] | | +- (org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile - omitted for duplicate)
          [INFO] | | - (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate)
          [INFO] | +- org.apache.mesos:mesos:jar:shaded-protobuf:0.21.1:compile
          [INFO] | +- io.netty:netty-all:jar:4.0.29.Final:compile
          [INFO] | +- io.netty:netty:jar:3.8.0.Final:compile
          [INFO] | +- com.clearspring.analytics:stream:jar:2.7.0:compile
          [INFO] | +- io.dropwizard.metrics:metrics-core:jar:3.1.2:compile
          [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate)
          [INFO] | +- io.dropwizard.metrics:metrics-jvm:jar:3.1.2:compile
          [INFO] | | +- (io.dropwizard.metrics:metrics-core:jar:3.1.2:compile - omitted for duplicate)
          [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate)
          [INFO] | +- io.dropwizard.metrics:metrics-json:jar:3.1.2:compile
          [INFO] | | +- (io.dropwizard.metrics:metrics-core:jar:3.1.2:compile - omitted for duplicate)
          [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.4.2:compile - omitted for conflict with 2.6.5)
          [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate)
          [INFO] | +- io.dropwizard.metrics:metrics-graphite:jar:3.1.2:compile
          [INFO] | | +- (io.dropwizard.metrics:metrics-core:jar:3.1.2:compile - omitted for duplicate)
          [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate)
          [INFO] | +- com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile
          [INFO] | | +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.6.0:compile - omitted for conflict with 2.6.5)
          [INFO] | | - (com.fasterxml.jackson.core:jackson-core:jar:2.6.5:compile - omitted for conflict with 2.8.1)
          [INFO] | +- com.fasterxml.jackson.module:jackson-module-scala_2.10:jar:2.6.5:compile
          [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.6:compile - omitted for duplicate)
          [INFO] | | +- org.scala-lang:scala-reflect:jar:2.10.6:compile
          [INFO] | | | - (org.scala-lang:scala-library:jar:2.10.6:compile - omitted for duplicate)
          [INFO] | | +- (com.fasterxml.jackson.core:jackson-core:jar:2.6.5:compile - omitted for conflict with 2.8.1)
          [INFO] | | +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.6.5:compile - omitted for duplicate)
          [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate)
          [INFO] | | - com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.6.5:compile
          [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate)
          [INFO] | | - (com.thoughtworks.paranamer:paranamer:jar:2.6:compile - omitted for duplicate)
          [INFO] | +- org.apache.ivy:ivy:jar:2.4.0:compile
          [INFO] | +- oro:oro:jar:2.0.8:compile
          [INFO] | +- net.razorvine:pyrolite:jar:4.9:compile
          [INFO] | +- net.sf.py4j:py4j:jar:0.10.1:compile
          [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
          [INFO] +- org.apache.spark:spark-streaming_2.10:jar:2.0.0:compile
          [INFO] | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | +- (org.scala-lang:scala-library:jar:2.10.6:compile - omitted for duplicate)
          [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
          [INFO] +- org.apache.spark:spark-sql_2.10:jar:2.0.0:compile
          [INFO] | +- com.univocity:univocity-parsers:jar:2.1.1:compile
          [INFO] | +- org.apache.spark:spark-sketch_2.10:jar:2.0.0:compile
          [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
          [INFO] | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | +- org.apache.spark:spark-catalyst_2.10:jar:2.0.0:compile
          [INFO] | | +- (org.scala-lang:scala-reflect:jar:2.10.6:compile - omitted for duplicate)
          [INFO] | | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | | +- (org.apache.spark:spark-unsafe_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | | +- org.codehaus.janino:janino:jar:2.7.8:compile
          [INFO] | | | - org.codehaus.janino:commons-compiler:jar:2.7.8:compile
          [INFO] | | +- org.antlr:antlr4-runtime:jar:4.5.3:compile
          [INFO] | | +- (commons-codec:commons-codec:jar:1.10:compile - omitted for duplicate)
          [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
          [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | +- org.apache.parquet:parquet-column:jar:1.7.0:compile
          [INFO] | | +- org.apache.parquet:parquet-common:jar:1.7.0:compile
          [INFO] | | +- org.apache.parquet:parquet-encoding:jar:1.7.0:compile
          [INFO] | | | +- (org.apache.parquet:parquet-common:jar:1.7.0:compile - omitted for duplicate)
          [INFO] | | | +- org.apache.parquet:parquet-generator:jar:1.7.0:compile
          [INFO] | | | | - (org.apache.parquet:parquet-common:jar:1.7.0:compile - omitted for duplicate)
          [INFO] | | | - (commons-codec:commons-codec:jar:1.5:compile - omitted for conflict with 1.10)
          [INFO] | | - (commons-codec:commons-codec:jar:1.5:compile - omitted for conflict with 1.10)
          [INFO] | +- org.apache.parquet:parquet-hadoop:jar:1.7.0:compile
          [INFO] | | +- (org.apache.parquet:parquet-column:jar:1.7.0:compile - omitted for duplicate)
          [INFO] | | +- org.apache.parquet:parquet-format:jar:2.3.0-incubating:compile
          [INFO] | | +- org.apache.parquet:parquet-jackson:jar:1.7.0:compile
          [INFO] | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.11:compile - omitted for conflict with 1.9.13)
          [INFO] | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.11:compile - omitted for conflict with 1.9.13)
          [INFO] | | - (org.xerial.snappy:snappy-java:jar:1.1.1.6:compile - omitted for conflict with 1.1.2.4)
          [INFO] | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate)
          [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
          [INFO] +- org.apache.spark:spark-graphx_2.10:jar:2.0.0:compile
          [INFO] | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | +- (org.apache.xbean:xbean-asm5-shaded:jar:4.4:compile - omitted for duplicate)
          [INFO] | +- com.github.fommil.netlib:core:jar:1.1.2:compile
          [INFO] | | - (net.sourceforge.f2j:arpack_combined_all:jar:0.1:compile - omitted for duplicate)
          [INFO] | +- net.sourceforge.f2j:arpack_combined_all:jar:0.1:compile
          [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
          [INFO] +- org.apache.spark:spark-mllib-local_2.10:jar:2.0.0:compile
          [INFO] | +- (org.scalanlp:breeze_2.10:jar:0.11.2:compile - omitted for duplicate)
          [INFO] | +- (org.apache.commons:commons-math3:jar:3.4.1:compile - omitted for duplicate)
          [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
          [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
          [INFO] +- org.scalanlp:breeze_2.10:jar:0.11.2:compile
          [INFO] | +- (org.scala-lang:scala-library:jar:2.10.4:compile - omitted for conflict with 2.10.6)
          [INFO] | +- org.scalanlp:breeze-macros_2.10:jar:0.11.2:compile
          [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.4:compile - omitted for conflict with 2.10.6)
          [INFO] | | +- org.scalamacros:quasiquotes_2.10:jar:2.0.0-M8:compile
          [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.3:compile - omitted for conflict with 2.10.6)
          [INFO] | | | - (org.scala-lang:scala-reflect:jar:2.10.3:compile - omitted for conflict with 2.10.6)
          [INFO] | | - (org.scala-lang:scala-reflect:jar:2.10.4:compile - omitted for conflict with 2.10.6)
          [INFO] | +- (com.github.fommil.netlib:core:jar:1.1.2:compile - omitted for duplicate)
          [INFO] | +- (net.sourceforge.f2j:arpack_combined_all:jar:0.1:compile - omitted for duplicate)
          [INFO] | +- net.sf.opencsv:opencsv:jar:2.3:compile
          [INFO] | +- com.github.rwl:jtransforms:jar:2.4.0:compile
          [INFO] | +- org.spire-math:spire_2.10:jar:0.7.4:compile
          [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.2:compile - omitted for conflict with 2.10.6)
          [INFO] | | +- org.spire-math:spire-macros_2.10:jar:0.7.4:compile
          [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.2:compile - omitted for conflict with 2.10.6)
          [INFO] | | | +- (org.scala-lang:scala-reflect:jar:2.10.2:compile - omitted for conflict with 2.10.6)
          [INFO] | | | - (org.scalamacros:quasiquotes_2.10:jar:2.0.0:compile - omitted for conflict with 2.0.0-M8)
          [INFO] | | +- (org.scala-lang:scala-reflect:jar:2.10.2:compile - omitted for conflict with 2.10.6)
          [INFO] | | - (org.scalamacros:quasiquotes_2.10:jar:2.0.0:compile - omitted for conflict with 2.0.0-M8)
          [INFO] | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
          [INFO] +- org.apache.commons:commons-math3:jar:3.4.1:compile
          [INFO] +- org.jpmml:pmml-model:jar:1.2.15:compile
          [INFO] | - org.jpmml:pmml-schema:jar:1.2.15:compile
          [INFO] +- org.apache.spark:spark-tags_2.10:jar:2.0.0:compile
          [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
          [INFO] - org.spark-project.spark:unused:jar:1.0.0:compile

          Show
          tallison@mitre.org Tim Allison added a comment - 1) Would it be possible to allow for failure to get/find models? Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (default) on project tika-parsers: An Ant BuildException has occured: Warning: Could not find file C:\blah\tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\en-pos-maxent.bin to copy. [ERROR] around Ant part ...<copy file="C:\blah\tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/en-pos-maxent.bin" todir="C:\blahtika-asf2-git-1.x\tika-parsers/model/opennlp/"/>... @ 4:238 in C:\blah\tika-asf2-git-1.x\tika-parsers\target\antrun\build-main.xml [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. 2) Can we put this in a separate module or try to pare down the dependencies? - edu.usc.ir:age-predictor-api:jar:1.0:compile [INFO] - edu.usc.ir:age-predictor-cli:jar:1.0:compile [INFO] +- edu.usc.ir:age-predictor-opennlp:jar:1.0:compile [INFO] | +- (org.apache.opennlp:opennlp-tools:jar:1.6.0:compile - omitted for duplicate) [INFO] | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.12; omitted for duplicate) [INFO] | - (commons-io:commons-io:jar:2.5:compile - omitted for duplicate) [INFO] +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.12; omitted for duplicate) [INFO] +- (commons-io:commons-io:jar:2.5:compile - omitted for duplicate) [INFO] - org.apache.spark:spark-mllib_2.10:jar:2.0.0:compile [INFO] +- org.apache.spark:spark-core_2.10:jar:2.0.0:compile [INFO] | +- org.apache.avro:avro-mapred:jar:hadoop2:1.7.7:compile [INFO] | | +- org.apache.avro:avro-ipc:jar:1.7.7:compile [INFO] | | | +- org.apache.avro:avro:jar:1.7.7:compile [INFO] | | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | | | +- (com.thoughtworks.paranamer:paranamer:jar:2.3:compile - omitted for conflict with 2.6) [INFO] | | | | +- (org.xerial.snappy:snappy-java:jar:1.0.5:compile - omitted for conflict with 1.1.2.4) [INFO] | | | | +- (org.apache.commons:commons-compress:jar:1.4.1:compile - omitted for conflict with 1.14) [INFO] | | | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate) [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate) [INFO] | | +- org.apache.avro:avro-ipc:jar:tests:1.7.7:compile [INFO] | | | +- (org.apache.avro:avro:jar:1.7.7:compile - omitted for duplicate) [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate) [INFO] | | +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile [INFO] | | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile [INFO] | | | - (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate) [INFO] | +- com.twitter:chill_2.10:jar:0.8.0:compile [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.5:compile - omitted for conflict with 2.10.6) [INFO] | | +- (com.twitter:chill-java:jar:0.8.0:compile - omitted for duplicate) [INFO] | | - com.esotericsoftware:kryo-shaded:jar:3.0.3:compile [INFO] | | +- com.esotericsoftware:minlog:jar:1.3.0:compile [INFO] | | - org.objenesis:objenesis:jar:2.1:compile [INFO] | +- com.twitter:chill-java:jar:0.8.0:compile [INFO] | | - (com.esotericsoftware:kryo-shaded:jar:3.0.3:compile - omitted for duplicate) [INFO] | +- org.apache.xbean:xbean-asm5-shaded:jar:4.4:compile [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile [INFO] | | | +- (org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | +- (com.google.guava:guava:jar:11.0.2:compile - omitted for conflict with 17.0) [INFO] | | | +- commons-cli:commons-cli:jar:1.2:compile [INFO] | | | +- org.apache.commons:commons-math:jar:2.1:compile [INFO] | | | +- xmlenc:xmlenc:jar:0.52:compile [INFO] | | | +- (commons-httpclient:commons-httpclient:jar:3.1:compile - omitted for duplicate) [INFO] | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.10) [INFO] | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5) [INFO] | | | +- (commons-net:commons-net:jar:3.1:compile - omitted for conflict with 2.2) [INFO] | | | +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate) [INFO] | | | +- (commons-lang:commons-lang:jar:2.5:compile - omitted for conflict with 2.6) [INFO] | | | +- commons-configuration:commons-configuration:jar:1.6:compile [INFO] | | | | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | | | | +- (commons-lang:commons-lang:jar:2.4:compile - omitted for conflict with 2.6) [INFO] | | | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | | | | - commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | | | - commons-beanutils:commons-beanutils-core:jar:1.8.0:compile [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:runtime - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile - omitted for conflict with 1.9.13) [INFO] | | | +- (org.apache.avro:avro:jar:1.7.4:compile - omitted for conflict with 1.7.7) [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | +- org.apache.hadoop:hadoop-auth:jar:2.2.0:compile [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.10) [INFO] | | | | +- (log4j:log4j:jar:1.2.17:runtime - omitted for duplicate) [INFO] | | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:runtime - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate) [INFO] | | | - (org.apache.commons:commons-compress:jar:1.4.1:compile - omitted for conflict with 1.14) [INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.2.0:compile [INFO] | | | +- (com.google.guava:guava:jar:11.0.2:compile - omitted for conflict with 17.0) [INFO] | | | +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile [INFO] | | | +- (commons-cli:commons-cli:jar:1.2:compile - omitted for duplicate) [INFO] | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.10) [INFO] | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5) [INFO] | | | +- (commons-lang:commons-lang:jar:2.5:compile - omitted for conflict with 2.6) [INFO] | | | +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate) [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile - omitted for conflict with 1.9.13) [INFO] | | | - (xmlenc:xmlenc:jar:0.52:compile - omitted for duplicate) [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.2.0:compile [INFO] | | | +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.2.0:compile [INFO] | | | | +- (org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | +- org.apache.hadoop:hadoop-yarn-client:jar:2.2.0:compile [INFO] | | | | | +- (org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | | +- (org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5) [INFO] | | | | | +- (com.google.inject:guice:jar:3.0:compile - omitted for duplicate) [INFO] | | | | | +- (com.sun.jersey.jersey-test-framework:jersey-test-framework-grizzly2:jar:1.9:compile - omitted for duplicate) [INFO] | | | | | +- (com.sun.jersey:jersey-server:jar:1.9:compile - omitted for duplicate) [INFO] | | | | | +- (com.sun.jersey:jersey-json:jar:1.9:compile - omitted for duplicate) [INFO] | | | | | - (com.sun.jersey.contribs:jersey-guice:jar:1.9:compile - omitted for duplicate) [INFO] | | | | +- (org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | +- org.apache.hadoop:hadoop-yarn-server-common:jar:2.2.0:compile [INFO] | | | | | +- (org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate) [INFO] | | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5) [INFO] | | | | | +- (com.google.inject:guice:jar:3.0:compile - omitted for duplicate) [INFO] | | | | | +- (com.sun.jersey.jersey-test-framework:jersey-test-framework-grizzly2:jar:1.9:compile - omitted for duplicate) [INFO] | | | | | +- (com.sun.jersey:jersey-server:jar:1.9:compile - omitted for duplicate) [INFO] | | | | | +- (com.sun.jersey:jersey-json:jar:1.9:compile - omitted for duplicate) [INFO] | | | | | - (com.sun.jersey.contribs:jersey-guice:jar:1.9:compile - omitted for duplicate) [INFO] | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.2.0:compile [INFO] | | | | +- (org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | +- org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | - (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5) [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile [INFO] | | | +- org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile [INFO] | | | | +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate) [INFO] | | | | +- (org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | | - (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5) [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.2.0:compile [INFO] | | | +- (org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | +- (org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | - org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile [INFO] | +- org.apache.spark:spark-launcher_2.10:jar:2.0.0:compile [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] | +- org.apache.spark:spark-network-common_2.10:jar:2.0.0:compile [INFO] | | +- (io.netty:netty-all:jar:4.0.29.Final:compile - omitted for duplicate) [INFO] | | +- (com.google.code.findbugs:jsr305:jar:1.3.9:compile - omitted for duplicate) [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] | +- org.apache.spark:spark-network-shuffle_2.10:jar:2.0.0:compile [INFO] | | +- (org.apache.spark:spark-network-common_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | +- org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:compile [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate) [INFO] | | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.6.5:compile [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] | +- org.apache.spark:spark-unsafe_2.10:jar:2.0.0:compile [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | +- (com.twitter:chill_2.10:jar:0.8.0:compile - omitted for duplicate) [INFO] | | +- (com.google.code.findbugs:jsr305:jar:1.3.9:compile - omitted for duplicate) [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.7.1:compile [INFO] | | +- (commons-codec:commons-codec:jar:1.3:compile - omitted for conflict with 1.10) [INFO] | | - commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | | - (commons-codec:commons-codec:jar:1.2:compile - omitted for conflict with 1.10) [INFO] | +- org.apache.curator:curator-recipes:jar:2.4.0:compile [INFO] | | +- org.apache.curator:curator-framework:jar:2.4.0:compile [INFO] | | | +- org.apache.curator:curator-client:jar:2.4.0:compile [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate) [INFO] | | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate) [INFO] | | | | - (com.google.guava:guava:jar:14.0.1:compile - omitted for conflict with 17.0) [INFO] | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate) [INFO] | | | - (com.google.guava:guava:jar:14.0.1:compile - omitted for conflict with 17.0) [INFO] | | +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.1; omitted for duplicate) [INFO] | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.6.1; omitted for duplicate) [INFO] | | | - (log4j:log4j:jar:1.2.15:compile - omitted for conflict with 1.2.17) [INFO] | | - (com.google.guava:guava:jar:14.0.1:compile - omitted for conflict with 17.0) [INFO] | +- javax.servlet:javax.servlet-api:jar:3.1.0:compile [INFO] | +- org.apache.commons:commons-lang3:jar:3.3.2:compile [INFO] | +- (org.apache.commons:commons-math3:jar:3.4.1:compile - omitted for duplicate) [INFO] | +- com.google.code.findbugs:jsr305:jar:1.3.9:compile [INFO] | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate) [INFO] | +- (org.slf4j:jul-to-slf4j:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate) [INFO] | +- (org.slf4j:jcl-over-slf4j:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate) [INFO] | +- log4j:log4j:jar:1.2.17:compile [INFO] | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate) [INFO] | +- com.ning:compress-lzf:jar:1.0.3:compile [INFO] | +- org.xerial.snappy:snappy-java:jar:1.1.2.4:compile [INFO] | +- net.jpountz.lz4:lz4:jar:1.3.0:compile [INFO] | +- org.roaringbitmap:RoaringBitmap:jar:0.5.11:compile [INFO] | +- commons-net:commons-net:jar:2.2:compile [INFO] | +- org.scala-lang:scala-library:jar:2.10.6:compile [INFO] | +- org.json4s:json4s-jackson_2.10:jar:3.2.11:compile [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6) [INFO] | | +- org.json4s:json4s-core_2.10:jar:3.2.11:compile [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6) [INFO] | | | +- org.json4s:json4s-ast_2.10:jar:3.2.11:compile [INFO] | | | | - (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6) [INFO] | | | +- com.thoughtworks.paranamer:paranamer:jar:2.6:compile [INFO] | | | - org.scala-lang:scalap:jar:2.10.0:compile [INFO] | | | - org.scala-lang:scala-compiler:jar:2.10.0:compile [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6) [INFO] | | | - (org.scala-lang:scala-reflect:jar:2.10.0:compile - omitted for conflict with 2.10.6) [INFO] | | - (com.fasterxml.jackson.core:jackson-databind:jar:2.3.1:compile - omitted for conflict with 2.6.5) [INFO] | +- org.glassfish.jersey.core:jersey-client:jar:2.22.2:compile [INFO] | | +- (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate) [INFO] | | +- org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile [INFO] | | | +- org.glassfish.hk2:hk2-utils:jar:2.4.0-b34:compile [INFO] | | | - org.glassfish.hk2.external:aopalliance-repackaged:jar:2.4.0-b34:compile [INFO] | | +- org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile [INFO] | | - org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2.external:aopalliance-repackaged:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2:hk2-utils:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | - org.javassist:javassist:jar:3.18.1-GA:compile [INFO] | +- org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile [INFO] | | +- (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate) [INFO] | | +- (javax.annotation:javax.annotation-api:jar:1.2:compile - omitted for duplicate) [INFO] | | +- org.glassfish.jersey.bundles.repackaged:jersey-guava:jar:2.22.2:compile [INFO] | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | - org.glassfish.hk2:osgi-resource-locator:jar:1.0.1:compile [INFO] | +- org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.jersey.core:jersey-client:jar:2.22.2:compile - omitted for duplicate) [INFO] | | +- (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate) [INFO] | | +- org.glassfish.jersey.media:jersey-media-jaxb:jar:2.22.2:compile [INFO] | | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate) [INFO] | | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | | +- (org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | | - (org.glassfish.hk2:osgi-resource-locator:jar:1.0.1:compile - omitted for duplicate) [INFO] | | +- (javax.annotation:javax.annotation-api:jar:1.2:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | - javax.validation:validation-api:jar:1.1.0.Final:compile [INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet:jar:2.22.2:compile [INFO] | | +- (org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.22.2:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile - omitted for duplicate) [INFO] | | - (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate) [INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.22.2:compile [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile - omitted for duplicate) [INFO] | | - (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate) [INFO] | +- org.apache.mesos:mesos:jar:shaded-protobuf:0.21.1:compile [INFO] | +- io.netty:netty-all:jar:4.0.29.Final:compile [INFO] | +- io.netty:netty:jar:3.8.0.Final:compile [INFO] | +- com.clearspring.analytics:stream:jar:2.7.0:compile [INFO] | +- io.dropwizard.metrics:metrics-core:jar:3.1.2:compile [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate) [INFO] | +- io.dropwizard.metrics:metrics-jvm:jar:3.1.2:compile [INFO] | | +- (io.dropwizard.metrics:metrics-core:jar:3.1.2:compile - omitted for duplicate) [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate) [INFO] | +- io.dropwizard.metrics:metrics-json:jar:3.1.2:compile [INFO] | | +- (io.dropwizard.metrics:metrics-core:jar:3.1.2:compile - omitted for duplicate) [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.4.2:compile - omitted for conflict with 2.6.5) [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate) [INFO] | +- io.dropwizard.metrics:metrics-graphite:jar:3.1.2:compile [INFO] | | +- (io.dropwizard.metrics:metrics-core:jar:3.1.2:compile - omitted for duplicate) [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate) [INFO] | +- com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile [INFO] | | +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.6.0:compile - omitted for conflict with 2.6.5) [INFO] | | - (com.fasterxml.jackson.core:jackson-core:jar:2.6.5:compile - omitted for conflict with 2.8.1) [INFO] | +- com.fasterxml.jackson.module:jackson-module-scala_2.10:jar:2.6.5:compile [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.6:compile - omitted for duplicate) [INFO] | | +- org.scala-lang:scala-reflect:jar:2.10.6:compile [INFO] | | | - (org.scala-lang:scala-library:jar:2.10.6:compile - omitted for duplicate) [INFO] | | +- (com.fasterxml.jackson.core:jackson-core:jar:2.6.5:compile - omitted for conflict with 2.8.1) [INFO] | | +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.6.5:compile - omitted for duplicate) [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate) [INFO] | | - com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.6.5:compile [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate) [INFO] | | - (com.thoughtworks.paranamer:paranamer:jar:2.6:compile - omitted for duplicate) [INFO] | +- org.apache.ivy:ivy:jar:2.4.0:compile [INFO] | +- oro:oro:jar:2.0.8:compile [INFO] | +- net.razorvine:pyrolite:jar:4.9:compile [INFO] | +- net.sf.py4j:py4j:jar:0.10.1:compile [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] +- org.apache.spark:spark-streaming_2.10:jar:2.0.0:compile [INFO] | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | +- (org.scala-lang:scala-library:jar:2.10.6:compile - omitted for duplicate) [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] +- org.apache.spark:spark-sql_2.10:jar:2.0.0:compile [INFO] | +- com.univocity:univocity-parsers:jar:2.1.1:compile [INFO] | +- org.apache.spark:spark-sketch_2.10:jar:2.0.0:compile [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | +- org.apache.spark:spark-catalyst_2.10:jar:2.0.0:compile [INFO] | | +- (org.scala-lang:scala-reflect:jar:2.10.6:compile - omitted for duplicate) [INFO] | | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | +- (org.apache.spark:spark-unsafe_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | +- org.codehaus.janino:janino:jar:2.7.8:compile [INFO] | | | - org.codehaus.janino:commons-compiler:jar:2.7.8:compile [INFO] | | +- org.antlr:antlr4-runtime:jar:4.5.3:compile [INFO] | | +- (commons-codec:commons-codec:jar:1.10:compile - omitted for duplicate) [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | +- org.apache.parquet:parquet-column:jar:1.7.0:compile [INFO] | | +- org.apache.parquet:parquet-common:jar:1.7.0:compile [INFO] | | +- org.apache.parquet:parquet-encoding:jar:1.7.0:compile [INFO] | | | +- (org.apache.parquet:parquet-common:jar:1.7.0:compile - omitted for duplicate) [INFO] | | | +- org.apache.parquet:parquet-generator:jar:1.7.0:compile [INFO] | | | | - (org.apache.parquet:parquet-common:jar:1.7.0:compile - omitted for duplicate) [INFO] | | | - (commons-codec:commons-codec:jar:1.5:compile - omitted for conflict with 1.10) [INFO] | | - (commons-codec:commons-codec:jar:1.5:compile - omitted for conflict with 1.10) [INFO] | +- org.apache.parquet:parquet-hadoop:jar:1.7.0:compile [INFO] | | +- (org.apache.parquet:parquet-column:jar:1.7.0:compile - omitted for duplicate) [INFO] | | +- org.apache.parquet:parquet-format:jar:2.3.0-incubating:compile [INFO] | | +- org.apache.parquet:parquet-jackson:jar:1.7.0:compile [INFO] | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.11:compile - omitted for conflict with 1.9.13) [INFO] | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.11:compile - omitted for conflict with 1.9.13) [INFO] | | - (org.xerial.snappy:snappy-java:jar:1.1.1.6:compile - omitted for conflict with 1.1.2.4) [INFO] | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate) [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] +- org.apache.spark:spark-graphx_2.10:jar:2.0.0:compile [INFO] | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | +- (org.apache.xbean:xbean-asm5-shaded:jar:4.4:compile - omitted for duplicate) [INFO] | +- com.github.fommil.netlib:core:jar:1.1.2:compile [INFO] | | - (net.sourceforge.f2j:arpack_combined_all:jar:0.1:compile - omitted for duplicate) [INFO] | +- net.sourceforge.f2j:arpack_combined_all:jar:0.1:compile [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] +- org.apache.spark:spark-mllib-local_2.10:jar:2.0.0:compile [INFO] | +- (org.scalanlp:breeze_2.10:jar:0.11.2:compile - omitted for duplicate) [INFO] | +- (org.apache.commons:commons-math3:jar:3.4.1:compile - omitted for duplicate) [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] +- org.scalanlp:breeze_2.10:jar:0.11.2:compile [INFO] | +- (org.scala-lang:scala-library:jar:2.10.4:compile - omitted for conflict with 2.10.6) [INFO] | +- org.scalanlp:breeze-macros_2.10:jar:0.11.2:compile [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.4:compile - omitted for conflict with 2.10.6) [INFO] | | +- org.scalamacros:quasiquotes_2.10:jar:2.0.0-M8:compile [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.3:compile - omitted for conflict with 2.10.6) [INFO] | | | - (org.scala-lang:scala-reflect:jar:2.10.3:compile - omitted for conflict with 2.10.6) [INFO] | | - (org.scala-lang:scala-reflect:jar:2.10.4:compile - omitted for conflict with 2.10.6) [INFO] | +- (com.github.fommil.netlib:core:jar:1.1.2:compile - omitted for duplicate) [INFO] | +- (net.sourceforge.f2j:arpack_combined_all:jar:0.1:compile - omitted for duplicate) [INFO] | +- net.sf.opencsv:opencsv:jar:2.3:compile [INFO] | +- com.github.rwl:jtransforms:jar:2.4.0:compile [INFO] | +- org.spire-math:spire_2.10:jar:0.7.4:compile [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.2:compile - omitted for conflict with 2.10.6) [INFO] | | +- org.spire-math:spire-macros_2.10:jar:0.7.4:compile [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.2:compile - omitted for conflict with 2.10.6) [INFO] | | | +- (org.scala-lang:scala-reflect:jar:2.10.2:compile - omitted for conflict with 2.10.6) [INFO] | | | - (org.scalamacros:quasiquotes_2.10:jar:2.0.0:compile - omitted for conflict with 2.0.0-M8) [INFO] | | +- (org.scala-lang:scala-reflect:jar:2.10.2:compile - omitted for conflict with 2.10.6) [INFO] | | - (org.scalamacros:quasiquotes_2.10:jar:2.0.0:compile - omitted for conflict with 2.0.0-M8) [INFO] | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] +- org.apache.commons:commons-math3:jar:3.4.1:compile [INFO] +- org.jpmml:pmml-model:jar:1.2.15:compile [INFO] | - org.jpmml:pmml-schema:jar:1.2.15:compile [INFO] +- org.apache.spark:spark-tags_2.10:jar:2.0.0:compile [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] - org.spark-project.spark:unused:jar:1.0.0:compile
          Hide
          chrismattmann Chris A. Mattmann added a comment -

          #1 - absolutely - i thought putting the model download in Thamme's ModelGetter.groovy script would ensure that even in Proxy environments that models were available. Tim why weren't the models available for you?

          #2 - sure jiminey Christmas - wow that's a lot of dependencies. What do you think about tika-nlp, with this as the first entry?

          Show
          chrismattmann Chris A. Mattmann added a comment - #1 - absolutely - i thought putting the model download in Thamme's ModelGetter.groovy script would ensure that even in Proxy environments that models were available. Tim why weren't the models available for you? #2 - sure jiminey Christmas - wow that's a lot of dependencies. What do you think about tika-nlp, with this as the first entry?
          Hide
          tallison@mitre.org Tim Allison added a comment - - edited

          1. No idea.

          2. Yes, rather. Tika-app ballooned to 181MB. Sounds good.

          Show
          tallison@mitre.org Tim Allison added a comment - - edited 1. No idea. 2. Yes, rather. Tika-app ballooned to 181MB. Sounds good.
          Hide
          tallison@mitre.org Tim Allison added a comment -

          3. At some point we should follow Konstantin Gribov's fantastic TIKA-2245 work and slf4j-ize logging...

          but that isn't as critical as 2.

          Show
          tallison@mitre.org Tim Allison added a comment - 3. At some point we should follow Konstantin Gribov 's fantastic TIKA-2245 work and slf4j-ize logging... but that isn't as critical as 2.
          Hide
          chrismattmann Chris A. Mattmann added a comment -

          Agree on #3. I'm going to take a first cut at tika-nlp. In the future when we unify our recognisers for Object/Text, we should think about moving the NER stuff from tika-parsers into tika-nlp. I'm not going to bother now, b/c it would create a situation where people previously had tika-app support NER, but in the future they would have to include tika-nlp.

          The other thing I think we should seriously consider - that tika-app's size ballooned as you put it - who cares? what if I'll gladly take a 181MB jar file if it gives me capability A, B, C, D all in a box? Two thoughts there. First is that we stop worrying about keeping tika-app so small. Pros: easy, doesn't require anything special; Cons: Size aficionados will be disappointed Second, we could make a tika-app-full module and tika-server-full that is tika-app, plus tika-dl and tika-nlp. Thoughts there?

          Show
          chrismattmann Chris A. Mattmann added a comment - Agree on #3. I'm going to take a first cut at tika-nlp. In the future when we unify our recognisers for Object/Text, we should think about moving the NER stuff from tika-parsers into tika-nlp. I'm not going to bother now, b/c it would create a situation where people previously had tika-app support NER, but in the future they would have to include tika-nlp. The other thing I think we should seriously consider - that tika-app's size ballooned as you put it - who cares? what if I'll gladly take a 181MB jar file if it gives me capability A, B, C, D all in a box? Two thoughts there. First is that we stop worrying about keeping tika-app so small. Pros: easy, doesn't require anything special; Cons: Size aficionados will be disappointed Second, we could make a tika-app-full module and tika-server-full that is tika-app, plus tika-dl and tika-nlp. Thoughts there?
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Thought: lower expectations for 2.0 (put off parser compos-ability and arbitrary metadata) and release pretty much as is (once we catch it up to trunk) at the end of the month.

          Show
          tallison@mitre.org Tim Allison added a comment - Thought: lower expectations for 2.0 (put off parser compos-ability and arbitrary metadata) and release pretty much as is (once we catch it up to trunk) at the end of the month.
          Hide
          chrismattmann Chris A. Mattmann added a comment -

          Sounds good to me...almost done with tika-nlp will commit shortly.

          Show
          chrismattmann Chris A. Mattmann added a comment - Sounds good to me...almost done with tika-nlp will commit shortly.
          Hide
          hudson Hudson added a comment -

          ABORTED: Integrated in Jenkins build Tika-trunk #1320 (See https://builds.apache.org/job/Tika-trunk/1320/)
          TIKA-1988 – allow for failure to copy age recognition models (tallison: https://github.com/apache/tika/commit/58a602f7c9e4a5666a33726767741be73e10cd09)

          Show
          hudson Hudson added a comment - ABORTED: Integrated in Jenkins build Tika-trunk #1320 (See https://builds.apache.org/job/Tika-trunk/1320/ ) TIKA-1988 – allow for failure to copy age recognition models (tallison: https://github.com/apache/tika/commit/58a602f7c9e4a5666a33726767741be73e10cd09 ) (edit) tika-parsers/pom.xml TIKA-1988 – allow for errors downloading models (tallison: https://github.com/apache/tika/commit/632f52db4713977aa93504517e57b8afe86e6e91 ) (edit) tika-parsers/src/main/java/org/apache/tika/parser/recognition/AgeRecogniserConfig.java
          Hide
          hudson Hudson added a comment -

          ABORTED: Integrated in Jenkins build Tika-trunk #1321 (See https://builds.apache.org/job/Tika-trunk/1321/)

          • (delete) tika-parsers/src/main/java/org/apache/tika/parser/recognition/AgeRecogniserConfig.java
          • (add) tika-nlp/src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml
          • (edit) tika-parsers/pom.xml
          • (delete) tika-parsers/src/main/java/org/apache/tika/parser/recognition/AgeRecogniser.java
          • (delete) tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml
          • (add) tika-nlp/src/test/java/org/apache/tika/parser/recognition/AgeRecogniserTest.java
          • (add) tika-nlp/src/main/java/org/apache/tika/parser/recognition/AgeRecogniserConfig.java
          • (delete) tika-parsers/src/test/java/org/apache/tika/parser/recognition/AgeRecogniserTest.java
          • (add) tika-nlp/pom.xml
          • (add) tika-nlp/src/main/java/org/apache/tika/parser/recognition/AgeRecogniser.java
          Show
          hudson Hudson added a comment - ABORTED: Integrated in Jenkins build Tika-trunk #1321 (See https://builds.apache.org/job/Tika-trunk/1321/ ) add Tika-NLP module - move AgeRecogniser out of tika-parsers TIKA-1988 (mattmann: https://github.com/apache/tika/commit/e07d9e1de077c2f332094ce5125d1f4cf779d80d ) (delete) tika-parsers/src/main/java/org/apache/tika/parser/recognition/AgeRecogniserConfig.java (add) tika-nlp/src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml (edit) tika-parsers/pom.xml (delete) tika-parsers/src/main/java/org/apache/tika/parser/recognition/AgeRecogniser.java (delete) tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml (add) tika-nlp/src/test/java/org/apache/tika/parser/recognition/AgeRecogniserTest.java (add) tika-nlp/src/main/java/org/apache/tika/parser/recognition/AgeRecogniserConfig.java (delete) tika-parsers/src/test/java/org/apache/tika/parser/recognition/AgeRecogniserTest.java (add) tika-nlp/pom.xml (add) tika-nlp/src/main/java/org/apache/tika/parser/recognition/AgeRecogniser.java
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Chris A. Mattmann, to confirm, you want the "model" directory at the same level as src?

          tika-nlp/
              model/
                  opennlp/
                      en-pos...bin
                      en-sent...bin
                      en-token...bin
                  org/
                          apache/
                                   ...
              src/
                  main/
                         ...
          
          Show
          tallison@mitre.org Tim Allison added a comment - Chris A. Mattmann , to confirm, you want the "model" directory at the same level as src? tika-nlp/ model/ opennlp/ en-pos...bin en-sent...bin en-token...bin org/ apache/ ... src/ main/ ...
          Hide
          chrismattmann Chris A. Mattmann added a comment -

          For now yes Tim Allison until we fix https://github.com/USCDataScience/AgePredictor/issues/11 in a 1.1 release later.

          Show
          chrismattmann Chris A. Mattmann added a comment - For now yes Tim Allison until we fix https://github.com/USCDataScience/AgePredictor/issues/11 in a 1.1 release later.
          Hide
          tallison@mitre.org Tim Allison added a comment -

          Thank you!

          Tim why weren't the models available for you?

          They weren't available because ModelGetter is triggered when one of the model files isn't there. In earlier builds, it was successfully pulled. When I deleted the earlier models, ModelGetter was triggered and all model files were successfully downloaded.

          Show
          tallison@mitre.org Tim Allison added a comment - Thank you! Tim why weren't the models available for you? They weren't available because ModelGetter is triggered when one of the model files isn't there. In earlier builds, it was successfully pulled. When I deleted the earlier models, ModelGetter was triggered and all model files were successfully downloaded.
          Hide
          msharan@usc.edu Madhav Sharan added a comment -

          I faced the same issue as Tim earlier. What do you guys think about using maven plugin for downloading models over our own script?

          https://github.com/maven-download-plugin/maven-download-plugin

          I checked and it seems to work with proxies too if that's the only issue. https://github.com/maven-download-plugin/maven-download-plugin/issues/74

          I think it could fit better with no custom code, Open to discussions though

          Show
          msharan@usc.edu Madhav Sharan added a comment - I faced the same issue as Tim earlier. What do you guys think about using maven plugin for downloading models over our own script? https://github.com/maven-download-plugin/maven-download-plugin I checked and it seems to work with proxies too if that's the only issue. https://github.com/maven-download-plugin/maven-download-plugin/issues/74 I think it could fit better with no custom code, Open to discussions though
          Hide
          grossws Konstantin Gribov added a comment -

          Tim Allison, my effort on migrating 2.x to slf4j suspended because I lack spare time for it. I hope to continue it next month but still not sure if something changes. Of course it shouldn't prevent releasing 2.0 because it's mostly internal changes with slight downstream project's dependencies modifications.

          Show
          grossws Konstantin Gribov added a comment - Tim Allison , my effort on migrating 2.x to slf4j suspended because I lack spare time for it. I hope to continue it next month but still not sure if something changes. Of course it shouldn't prevent releasing 2.0 because it's mostly internal changes with slight downstream project's dependencies modifications.
          Hide
          githubbot ASF GitHub Bot added a comment -

          r00t1ng commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
          URL: https://github.com/apache/tika/pull/186#issuecomment-346447960

          @chrismattmann
          I had the same issue with running the server on 1.17.
          Can you please let me know if there is any installation i missed. I had a successful build when installing the git version.

          Thank you

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - r00t1ng commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-346447960 @chrismattmann I had the same issue with running the server on 1.17. Can you please let me know if there is any installation i missed. I had a successful build when installing the git version. Thank you ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
          URL: https://github.com/apache/tika/pull/186#issuecomment-346450270

          hi @r00t1ng what issue did you have?

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-346450270 hi @r00t1ng what issue did you have? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          r00t1ng commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
          URL: https://github.com/apache/tika/pull/186#issuecomment-346454415

          Hi

          Nov 22, 2017 8:41:43 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
          WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
          See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
          for optional dependencies.
          TIFFImageWriter not loaded. tiff files will not be processed
          See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
          for optional dependencies.
          J2KImageReader not loaded. JPEG2000 files will not be processed.
          See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
          for optional dependencies.

          Thank you

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - r00t1ng commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-346454415 Hi Nov 22, 2017 8:41:43 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. TIFFImageWriter not loaded. tiff files will not be processed See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Thank you ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
          URL: https://github.com/apache/tika/pull/186#issuecomment-346455515

          gotcha @r00t1ng those are just warnings. If you inspect the full output you'll see this:

          ```
          Content-Length: 17
          Content-Type: text/plain
          Estimated-Author-Age: 32.29913797083779
          X-Parsed-By: org.apache.tika.parser.CompositeParser
          X-Parsed-By: org.apache.tika.parser.recognition.AgeRecogniser
          resourceName: test.txt
          ```

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-346455515 gotcha @r00t1ng those are just warnings. If you inspect the full output you'll see this: ``` Content-Length: 17 Content-Type: text/plain Estimated-Author-Age: 32.29913797083779 X-Parsed-By: org.apache.tika.parser.CompositeParser X-Parsed-By: org.apache.tika.parser.recognition.AgeRecogniser resourceName: test.txt ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          r00t1ng commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
          URL: https://github.com/apache/tika/pull/186#issuecomment-346456231

          Thx. My problem resides in using tika to attempt text extraction from image based pdfs. I installed tesseract and it is being used by tika. I am also using your python wrapper with basic parsing for content. I am not getting anything out of various documents. Sorry if I post in the wrong repository.
          Please advise.

          Thank you

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - r00t1ng commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-346456231 Thx. My problem resides in using tika to attempt text extraction from image based pdfs. I installed tesseract and it is being used by tika. I am also using your python wrapper with basic parsing for content. I am not getting anything out of various documents. Sorry if I post in the wrong repository. Please advise. Thank you ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
          URL: https://github.com/apache/tika/pull/186#issuecomment-346464668

          hi @r00t1ng got it. So if you are doing text extraction from image based PDFs and using the python wrapper, it should be working. You can control what parsers are getting called by providing a custom tika-config.xml file. Depending on what type of PDF it is, you should check:

          1. Does Tesseract (outside of Tika) extract text from the PDF? If so what are the settings used from the command line?
          2. If Tesseract doesn't extract text outside of Tika then Tika won't b/c it's just a pass through to Tesseract on that part.

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-346464668 hi @r00t1ng got it. So if you are doing text extraction from image based PDFs and using the python wrapper, it should be working. You can control what parsers are getting called by providing a custom tika-config.xml file. Depending on what type of PDF it is, you should check: 1. Does Tesseract (outside of Tika) extract text from the PDF? If so what are the settings used from the command line? 2. If Tesseract doesn't extract text outside of Tika then Tika won't b/c it's just a pass through to Tesseract on that part. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          r00t1ng commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
          URL: https://github.com/apache/tika/pull/186#issuecomment-346471235

          Tesseract is working from within Tika so no segregated jobs. But what if tika does some text extraction and does not pass the file to tesseract as well?
          Thank you

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - r00t1ng commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-346471235 Tesseract is working from within Tika so no segregated jobs. But what if tika does some text extraction and does not pass the file to tesseract as well? Thank you ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
          URL: https://github.com/apache/tika/pull/186#issuecomment-346478122

          hi @r00t1ng if you are calling Tesseract and it's working fine from within Tika, basically the best recommendation I have is to file some new JIRA tickets here http://issues.apache.org/jira/browse/TIKA - attach the files (PDFs) that you are not having good results with and then identify what the expected text is.

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-346478122 hi @r00t1ng if you are calling Tesseract and it's working fine from within Tika, basically the best recommendation I have is to file some new JIRA tickets here http://issues.apache.org/jira/browse/TIKA - attach the files (PDFs) that you are not having good results with and then identify what the expected text is. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              msharan@usc.edu Madhav Sharan
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development