Details

      Description

      Author age can be firs feature and more can be added later


      Integrating work done on age classification. More details about classifier in below repo -
      https://github.com/USCDataScience/Age-Predictor

      Git repo have a java client which can be integrated in Tika

        Activity

        Hide
        chrismattmann Chris A. Mattmann added a comment -

        sounds great Madhav Sharan any progress?

        Show
        chrismattmann Chris A. Mattmann added a comment - sounds great Madhav Sharan any progress?
        Hide
        msharan@usc.edu Madhav Sharan added a comment -

        I did raise a PR in https://github.com/apache/tika/pull/186

        Don't know why it was not tracked here.

        Once you review it, I'll push AgePredicter jar to maven central

        Show
        msharan@usc.edu Madhav Sharan added a comment - I did raise a PR in https://github.com/apache/tika/pull/186 Don't know why it was not tracked here. Once you review it, I'll push AgePredicter jar to maven central
        Hide
        githubbot ASF GitHub Bot added a comment -

        chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
        URL: https://github.com/apache/tika/pull/186#issuecomment-310274450

        thanks @smadha missed this will review now!

        ----------------------------------------------------------------
        This is an automated message from the Apache Git Service.
        To respond to the message, please log on GitHub and use the
        URL above to go to the specific comment.

        For queries about this service, please contact Infrastructure at:
        users@infra.apache.org

        Show
        githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-310274450 thanks @smadha missed this will review now! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        sorry I missed it! will look now

        Show
        chrismattmann Chris A. Mattmann added a comment - sorry I missed it! will look now
        Hide
        githubbot ASF GitHub Bot added a comment -

        smadha commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
        URL: https://github.com/apache/tika/pull/186#issuecomment-312561843

        @chrismattmann - Any comments?

        ----------------------------------------------------------------
        This is an automated message from the Apache Git Service.
        To respond to the message, please log on GitHub and use the
        URL above to go to the specific comment.

        For queries about this service, please contact Infrastructure at:
        users@infra.apache.org

        Show
        githubbot ASF GitHub Bot added a comment - smadha commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-312561843 @chrismattmann - Any comments? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
        Hide
        githubbot ASF GitHub Bot added a comment -

        chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
        URL: https://github.com/apache/tika/pull/186#issuecomment-313587299

        Finally got this working!

        ```
        LMC-053601:tika-parsers mattmann$ java -cp ../tika-app/target/tika-app-1.16-SNAPSHOT.jar:./model org.apache.tika.cli.TikaCLI --config=src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml -m test.txt
        Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
        WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The ImageParser will skip jbig2 images
        Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
        WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
        See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
        for optional dependencies.
        TIFFImageWriter not loaded. tiff files will not be processed
        See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
        for optional dependencies.
        J2KImageReader not loaded. JPEG2000 files will not be processed.
        See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
        for optional dependencies.

        Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
        WARNING: Tesseract OCR is installed and will be automatically applied to image files.
        This may dramatically slow down content extraction (TIKA-2359).
        As of Tika 1.15 (and prior versions), Tesseract is automatically called.
        In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig.
        Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
        WARNING: org.xerial's sqlite-jdbc is not loaded.
        Please provide the jar on your classpath to parse sqlite files.
        See tika-parsers/pom.xml for the correct version.
        Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
        WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The ImageParser will skip jbig2 images
        Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
        WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
        See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
        for optional dependencies.
        TIFFImageWriter not loaded. tiff files will not be processed
        See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
        for optional dependencies.
        J2KImageReader not loaded. JPEG2000 files will not be processed.
        See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
        for optional dependencies.

        Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
        WARNING: Tesseract OCR is installed and will be automatically applied to image files.
        This may dramatically slow down content extraction (TIKA-2359).
        As of Tika 1.15 (and prior versions), Tesseract is automatically called.
        In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig.
        Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
        WARNING: org.xerial's sqlite-jdbc is not loaded.
        Please provide the jar on your classpath to parse sqlite files.
        See tika-parsers/pom.xml for the correct version.
        INFO Running Spark version 2.0.0
        WARN Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        INFO Changing view acls to: mattmann
        INFO Changing modify acls to: mattmann
        INFO Changing view acls groups to:
        INFO Changing modify acls groups to:
        INFO SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mattmann); groups with view permissions: Set(); users with modify permissions: Set(mattmann); groups with modify permissions: Set()
        INFO Successfully started service 'sparkDriver' on port 51510.
        INFO Registering MapOutputTracker
        INFO Registering BlockManagerMaster
        INFO Created local directory at /private/var/folders/n5/1d_k3z4s2293q8ntx_n8sw54mm5n_8/T/blockmgr-bd30e8b2-1f38-49f9-b170-c3a95a7e312b
        INFO MemoryStore started with capacity 2004.6 MB
        INFO Registering OutputCommitCoordinator
        INFO Logging initialized @1597ms
        INFO jetty-9.2.z-SNAPSHOT
        INFO Started o.s.j.s.ServletContextHandler@f73dcd6

        {/jobs,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@5c87bfe2

        {/jobs/json,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@2fea7088

        {/jobs/job,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@40499e4f

        {/jobs/job/json,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@51cd7ffc

        {/stages,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@30d4b288

        {/stages/json,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@4cc6fa2a

        {/stages/stage,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@40f1be1b

        {/stages/stage/json,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@7a791b66

        {/stages/pool,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@6f2cb653

        {/stages/pool/json,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@14c01636

        {/storage,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@590c73d3

        {/storage/json,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@6b9ce1bf

        {/storage/rdd,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@61884cb1

        {/storage/rdd/json,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@75ed9710

        {/environment,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@4fc5e095

        {/environment/json,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@435871cb

        {/executors,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@609640d5

        {/executors/json,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@79da1ec0

        {/executors/threadDump,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@19fb8826

        {/executors/threadDump/json,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@192d74fb

        {/static,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@4bef0fe3

        {/,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@62ea3440

        {/api,null,AVAILABLE}

        INFO Started o.s.j.s.ServletContextHandler@27953a83

        {/stages/stage/kill,null,AVAILABLE}

        INFO Started ServerConnector@25748410

        {HTTP/1.1}{0.0.0.0:4040}
        INFO Started @1705ms
        INFO Successfully started service 'SparkUI' on port 4040.
        INFO Bound SparkUI to 0.0.0.0, and started at http://192.168.1.65:4040
        INFO Starting executor ID driver on host localhost
        INFO Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51511.
        INFO Server created on 192.168.1.65:51511
        INFO Registering BlockManager BlockManagerId(driver, 192.168.1.65, 51511)
        INFO Registering block manager 192.168.1.65:51511 with 2004.6 MB RAM, BlockManagerId(driver, 192.168.1.65, 51511)
        INFO Registered BlockManager BlockManagerId(driver, 192.168.1.65, 51511)
        INFO Started o.s.j.s.ServletContextHandler@5305c37d{/metrics/json,null,AVAILABLE}
        WARN Use an existing SparkContext, some configuration may not take effect.
        INFO Started o.s.j.s.ServletContextHandler@3c1e3314{/SQL,null,AVAILABLE}
        INFO Started o.s.j.s.ServletContextHandler@78e16155{/SQL/json,null,AVAILABLE}
        INFO Started o.s.j.s.ServletContextHandler@50b0bc4c{/SQL/execution,null,AVAILABLE}
        INFO Started o.s.j.s.ServletContextHandler@13c612bd{/SQL/execution/json,null,AVAILABLE}
        INFO Started o.s.j.s.ServletContextHandler@28fa700e{/static/sql,null,AVAILABLE}
        INFO Warehouse path is 'file:/Users/mattmann/tmp/tika1.15/tika-parsers/spark-warehouse'.
        INFO Block broadcast_0 stored as values in memory (estimated size 6.1 MB, free 1998.5 MB)
        INFO Block broadcast_0_piece0 stored as bytes in memory (estimated size 488.5 KB, free 1998.0 MB)
        INFO Added broadcast_0_piece0 in memory on 192.168.1.65:51511 (size: 488.5 KB, free: 2004.1 MB)
        INFO Created broadcast 0 from broadcast at CountVectorizer.scala:243
        INFO Code generated in 1407.24616 ms
        INFO Starting job: first at AgePredicterLocal.java:114
        INFO Got job 0 (first at AgePredicterLocal.java:114) with 1 output partitions
        INFO Final stage: ResultStage 0 (first at AgePredicterLocal.java:114)
        INFO Parents of final stage: List()
        INFO Missing parents: List()
        INFO Submitting ResultStage 0 (MapPartitionsRDD[3] at javaRDD at AgePredicterLocal.java:112), which has no missing parents
        INFO Block broadcast_1 stored as values in memory (estimated size 10.5 KB, free 1998.0 MB)
        INFO Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.3 KB, free 1998.0 MB)
        INFO Added broadcast_1_piece0 in memory on 192.168.1.65:51511 (size: 5.3 KB, free: 2004.1 MB)
        INFO Created broadcast 1 from broadcast at DAGScheduler.scala:1012
        INFO Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at javaRDD at AgePredicterLocal.java:112)
        INFO Adding task set 0.0 with 1 tasks
        INFO Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, PROCESS_LOCAL, 6477 bytes)
        INFO Running task 0.0 in stage 0.0 (TID 0)
        INFO Code generated in 16.846256 ms
        INFO Finished task 0.0 in stage 0.0 (TID 0). 3228 bytes result sent to driver
        INFO Finished task 0.0 in stage 0.0 (TID 0) in 90 ms on localhost (1/1)
        INFO Removed TaskSet 0.0, whose tasks have all completed, from pool
        INFO ResultStage 0 (first at AgePredicterLocal.java:114) finished in 0.103 s
        INFO Job 0 finished: first at AgePredicterLocal.java:114, took 0.161496 s
        Content-Length: 17
        Content-Type: text/plain
        Estimated-Author-Age: 32.29913797083779
        X-Parsed-By: org.apache.tika.parser.CompositeParser
        X-Parsed-By: org.apache.tika.parser.recognition.AgeRecogniser
        resourceName: test.txt
        INFO Invoking stop() from shutdown hook
        INFO Stopped ServerConnector@25748410{HTTP/1.1} {0.0.0.0:4040}

        INFO Stopped o.s.j.s.ServletContextHandler@27953a83

        {/stages/stage/kill,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@62ea3440

        {/api,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@4bef0fe3

        {/,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@192d74fb

        {/static,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@19fb8826

        {/executors/threadDump/json,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@79da1ec0

        {/executors/threadDump,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@609640d5

        {/executors/json,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@435871cb

        {/executors,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@4fc5e095

        {/environment/json,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@75ed9710

        {/environment,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@61884cb1

        {/storage/rdd/json,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@6b9ce1bf

        {/storage/rdd,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@590c73d3

        {/storage/json,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@14c01636

        {/storage,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@6f2cb653

        {/stages/pool/json,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@7a791b66

        {/stages/pool,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@40f1be1b

        {/stages/stage/json,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@4cc6fa2a

        {/stages/stage,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@30d4b288

        {/stages/json,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@51cd7ffc

        {/stages,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@40499e4f

        {/jobs/job/json,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@2fea7088

        {/jobs/job,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@5c87bfe2

        {/jobs/json,null,UNAVAILABLE}

        INFO Stopped o.s.j.s.ServletContextHandler@f73dcd6

        {/jobs,null,UNAVAILABLE}

        INFO Stopped Spark web UI at http://192.168.1.65:4040
        INFO MapOutputTrackerMasterEndpoint stopped!
        INFO MemoryStore cleared
        INFO BlockManager stopped
        INFO BlockManagerMaster stopped
        INFO OutputCommitCoordinator stopped!
        INFO Successfully stopped SparkContext
        INFO Shutdown hook called
        INFO Deleting directory /private/var/folders/n5/1d_k3z4s2293q8ntx_n8sw54mm5n_8/T/spark-fa52d6bc-863e-4ee1-98da-8352c0c5c84e
        LMC-053601:tika-parsers mattmann$
        ```

        Will commit now!

        ----------------------------------------------------------------
        This is an automated message from the Apache Git Service.
        To respond to the message, please log on GitHub and use the
        URL above to go to the specific comment.

        For queries about this service, please contact Infrastructure at:
        users@infra.apache.org

        Show
        githubbot ASF GitHub Bot added a comment - chrismattmann commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-313587299 Finally got this working! ``` LMC-053601:tika-parsers mattmann$ java -cp ../tika-app/target/tika-app-1.16-SNAPSHOT.jar:./model org.apache.tika.cli.TikaCLI --config=src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml -m test.txt Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The ImageParser will skip jbig2 images Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. TIFFImageWriter not loaded. tiff files will not be processed See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: Tesseract OCR is installed and will be automatically applied to image files. This may dramatically slow down content extraction ( TIKA-2359 ). As of Tika 1.15 (and prior versions), Tesseract is automatically called. In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig. Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: com.levigo.jbig2.JBIG2ImageReader not on class path. The ImageParser will skip jbig2 images Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. TIFFImageWriter not loaded. tiff files will not be processed See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: Tesseract OCR is installed and will be automatically applied to image files. This may dramatically slow down content extraction ( TIKA-2359 ). As of Tika 1.15 (and prior versions), Tesseract is automatically called. In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig. Jul 06, 2017 9:58:31 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Running Spark version 2.0.0 WARN Unable to load native-hadoop library for your platform... using builtin-java classes where applicable INFO Changing view acls to: mattmann INFO Changing modify acls to: mattmann INFO Changing view acls groups to: INFO Changing modify acls groups to: INFO SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mattmann); groups with view permissions: Set(); users with modify permissions: Set(mattmann); groups with modify permissions: Set() INFO Successfully started service 'sparkDriver' on port 51510. INFO Registering MapOutputTracker INFO Registering BlockManagerMaster INFO Created local directory at /private/var/folders/n5/1d_k3z4s2293q8ntx_n8sw54mm5n_8/T/blockmgr-bd30e8b2-1f38-49f9-b170-c3a95a7e312b INFO MemoryStore started with capacity 2004.6 MB INFO Registering OutputCommitCoordinator INFO Logging initialized @1597ms INFO jetty-9.2.z-SNAPSHOT INFO Started o.s.j.s.ServletContextHandler@f73dcd6 {/jobs,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@5c87bfe2 {/jobs/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@2fea7088 {/jobs/job,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@40499e4f {/jobs/job/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@51cd7ffc {/stages,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@30d4b288 {/stages/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@4cc6fa2a {/stages/stage,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@40f1be1b {/stages/stage/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@7a791b66 {/stages/pool,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@6f2cb653 {/stages/pool/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@14c01636 {/storage,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@590c73d3 {/storage/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@6b9ce1bf {/storage/rdd,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@61884cb1 {/storage/rdd/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@75ed9710 {/environment,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@4fc5e095 {/environment/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@435871cb {/executors,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@609640d5 {/executors/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@79da1ec0 {/executors/threadDump,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@19fb8826 {/executors/threadDump/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@192d74fb {/static,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@4bef0fe3 {/,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@62ea3440 {/api,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@27953a83 {/stages/stage/kill,null,AVAILABLE} INFO Started ServerConnector@25748410 {HTTP/1.1}{0.0.0.0:4040} INFO Started @1705ms INFO Successfully started service 'SparkUI' on port 4040. INFO Bound SparkUI to 0.0.0.0, and started at http://192.168.1.65:4040 INFO Starting executor ID driver on host localhost INFO Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51511. INFO Server created on 192.168.1.65:51511 INFO Registering BlockManager BlockManagerId(driver, 192.168.1.65, 51511) INFO Registering block manager 192.168.1.65:51511 with 2004.6 MB RAM, BlockManagerId(driver, 192.168.1.65, 51511) INFO Registered BlockManager BlockManagerId(driver, 192.168.1.65, 51511) INFO Started o.s.j.s.ServletContextHandler@5305c37d{/metrics/json,null,AVAILABLE} WARN Use an existing SparkContext, some configuration may not take effect. INFO Started o.s.j.s.ServletContextHandler@3c1e3314{/SQL,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@78e16155{/SQL/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@50b0bc4c{/SQL/execution,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@13c612bd{/SQL/execution/json,null,AVAILABLE} INFO Started o.s.j.s.ServletContextHandler@28fa700e{/static/sql,null,AVAILABLE} INFO Warehouse path is 'file:/Users/mattmann/tmp/tika1.15/tika-parsers/spark-warehouse'. INFO Block broadcast_0 stored as values in memory (estimated size 6.1 MB, free 1998.5 MB) INFO Block broadcast_0_piece0 stored as bytes in memory (estimated size 488.5 KB, free 1998.0 MB) INFO Added broadcast_0_piece0 in memory on 192.168.1.65:51511 (size: 488.5 KB, free: 2004.1 MB) INFO Created broadcast 0 from broadcast at CountVectorizer.scala:243 INFO Code generated in 1407.24616 ms INFO Starting job: first at AgePredicterLocal.java:114 INFO Got job 0 (first at AgePredicterLocal.java:114) with 1 output partitions INFO Final stage: ResultStage 0 (first at AgePredicterLocal.java:114) INFO Parents of final stage: List() INFO Missing parents: List() INFO Submitting ResultStage 0 (MapPartitionsRDD [3] at javaRDD at AgePredicterLocal.java:112), which has no missing parents INFO Block broadcast_1 stored as values in memory (estimated size 10.5 KB, free 1998.0 MB) INFO Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.3 KB, free 1998.0 MB) INFO Added broadcast_1_piece0 in memory on 192.168.1.65:51511 (size: 5.3 KB, free: 2004.1 MB) INFO Created broadcast 1 from broadcast at DAGScheduler.scala:1012 INFO Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD [3] at javaRDD at AgePredicterLocal.java:112) INFO Adding task set 0.0 with 1 tasks INFO Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, PROCESS_LOCAL, 6477 bytes) INFO Running task 0.0 in stage 0.0 (TID 0) INFO Code generated in 16.846256 ms INFO Finished task 0.0 in stage 0.0 (TID 0). 3228 bytes result sent to driver INFO Finished task 0.0 in stage 0.0 (TID 0) in 90 ms on localhost (1/1) INFO Removed TaskSet 0.0, whose tasks have all completed, from pool INFO ResultStage 0 (first at AgePredicterLocal.java:114) finished in 0.103 s INFO Job 0 finished: first at AgePredicterLocal.java:114, took 0.161496 s Content-Length: 17 Content-Type: text/plain Estimated-Author-Age: 32.29913797083779 X-Parsed-By: org.apache.tika.parser.CompositeParser X-Parsed-By: org.apache.tika.parser.recognition.AgeRecogniser resourceName: test.txt INFO Invoking stop() from shutdown hook INFO Stopped ServerConnector@25748410{HTTP/1.1} {0.0.0.0:4040} INFO Stopped o.s.j.s.ServletContextHandler@27953a83 {/stages/stage/kill,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@62ea3440 {/api,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@4bef0fe3 {/,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@192d74fb {/static,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@19fb8826 {/executors/threadDump/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@79da1ec0 {/executors/threadDump,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@609640d5 {/executors/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@435871cb {/executors,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@4fc5e095 {/environment/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@75ed9710 {/environment,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@61884cb1 {/storage/rdd/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@6b9ce1bf {/storage/rdd,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@590c73d3 {/storage/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@14c01636 {/storage,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@6f2cb653 {/stages/pool/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@7a791b66 {/stages/pool,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@40f1be1b {/stages/stage/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@4cc6fa2a {/stages/stage,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@30d4b288 {/stages/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@51cd7ffc {/stages,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@40499e4f {/jobs/job/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@2fea7088 {/jobs/job,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@5c87bfe2 {/jobs/json,null,UNAVAILABLE} INFO Stopped o.s.j.s.ServletContextHandler@f73dcd6 {/jobs,null,UNAVAILABLE} INFO Stopped Spark web UI at http://192.168.1.65:4040 INFO MapOutputTrackerMasterEndpoint stopped! INFO MemoryStore cleared INFO BlockManager stopped INFO BlockManagerMaster stopped INFO OutputCommitCoordinator stopped! INFO Successfully stopped SparkContext INFO Shutdown hook called INFO Deleting directory /private/var/folders/n5/1d_k3z4s2293q8ntx_n8sw54mm5n_8/T/spark-fa52d6bc-863e-4ee1-98da-8352c0c5c84e LMC-053601:tika-parsers mattmann$ ``` Will commit now! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
        Hide
        githubbot ASF GitHub Bot added a comment -

        chrismattmann closed pull request #186: fix for TIKA-1988 contributed by msharan@usc.edu
        URL: https://github.com/apache/tika/pull/186

        ----------------------------------------------------------------
        This is an automated message from the Apache Git Service.
        To respond to the message, please log on GitHub and use the
        URL above to go to the specific comment.

        For queries about this service, please contact Infrastructure at:
        users@infra.apache.org

        Show
        githubbot ASF GitHub Bot added a comment - chrismattmann closed pull request #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
        Hide
        githubbot ASF GitHub Bot added a comment -

        smadha commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu
        URL: https://github.com/apache/tika/pull/186#issuecomment-313587646

        Thanks @chrismattmann

        ----------------------------------------------------------------
        This is an automated message from the Apache Git Service.
        To respond to the message, please log on GitHub and use the
        URL above to go to the specific comment.

        For queries about this service, please contact Infrastructure at:
        users@infra.apache.org

        Show
        githubbot ASF GitHub Bot added a comment - smadha commented on issue #186: fix for TIKA-1988 contributed by msharan@usc.edu URL: https://github.com/apache/tika/pull/186#issuecomment-313587646 Thanks @chrismattmann ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
        Hide
        chrismattmann Chris A. Mattmann added a comment -
        Show
        chrismattmann Chris A. Mattmann added a comment - merged into master thanks Madhav Sharan , Thamme Gowda and Tim Allison
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Jenkins build Tika-trunk #1319 (See https://builds.apache.org/job/Tika-trunk/1319/)
        Fix Felix bundle rules for Age Prediction Parser OGSI bundle. TIKA-1988. (mattmann: https://github.com/apache/tika/commit/9be1785e948822c58138bc4b660ec4421ee26e5d)

        • (edit) tika-bundle/pom.xml
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Jenkins build Tika-trunk #1319 (See https://builds.apache.org/job/Tika-trunk/1319/ ) Fix Felix bundle rules for Age Prediction Parser OGSI bundle. TIKA-1988 . (mattmann: https://github.com/apache/tika/commit/9be1785e948822c58138bc4b660ec4421ee26e5d ) (edit) tika-bundle/pom.xml
        Hide
        tallison@mitre.org Tim Allison added a comment -

        1) Would it be possible to allow for failure to get/find models?

        Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (default) on project tika-parsers: An Ant BuildException has occured: Warning: Could not find file C:\blah\tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\en-pos-maxent.bin to copy.
        [ERROR] around Ant part ...<copy file="C:\blah\tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/en-pos-maxent.bin" todir="C:\blahtika-asf2-git-1.x\tika-parsers/model/opennlp/"/>... @ 4:238 in C:\blah\tika-asf2-git-1.x\tika-parsers\target\antrun\build-main.xml
        [ERROR] -> [Help 1]
        [ERROR]
        [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
        [ERROR] Re-run Maven using the -X switch to enable full debug logging.

        2) Can we put this in a separate module or try to pare down the dependencies?
        - edu.usc.ir:age-predictor-api:jar:1.0:compile
        [INFO] - edu.usc.ir:age-predictor-cli:jar:1.0:compile
        [INFO] +- edu.usc.ir:age-predictor-opennlp:jar:1.0:compile
        [INFO] | +- (org.apache.opennlp:opennlp-tools:jar:1.6.0:compile - omitted for duplicate)
        [INFO] | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.12; omitted for duplicate)
        [INFO] | - (commons-io:commons-io:jar:2.5:compile - omitted for duplicate)
        [INFO] +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.12; omitted for duplicate)
        [INFO] +- (commons-io:commons-io:jar:2.5:compile - omitted for duplicate)
        [INFO] - org.apache.spark:spark-mllib_2.10:jar:2.0.0:compile
        [INFO] +- org.apache.spark:spark-core_2.10:jar:2.0.0:compile
        [INFO] | +- org.apache.avro:avro-mapred:jar:hadoop2:1.7.7:compile
        [INFO] | | +- org.apache.avro:avro-ipc:jar:1.7.7:compile
        [INFO] | | | +- org.apache.avro:avro:jar:1.7.7:compile
        [INFO] | | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate)
        [INFO] | | | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile - omitted for duplicate)
        [INFO] | | | | +- (com.thoughtworks.paranamer:paranamer:jar:2.3:compile - omitted for conflict with 2.6)
        [INFO] | | | | +- (org.xerial.snappy:snappy-java:jar:1.0.5:compile - omitted for conflict with 1.1.2.4)
        [INFO] | | | | +- (org.apache.commons:commons-compress:jar:1.4.1:compile - omitted for conflict with 1.14)
        [INFO] | | | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate)
        [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate)
        [INFO] | | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile - omitted for duplicate)
        [INFO] | | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate)
        [INFO] | | +- org.apache.avro:avro-ipc:jar:tests:1.7.7:compile
        [INFO] | | | +- (org.apache.avro:avro:jar:1.7.7:compile - omitted for duplicate)
        [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate)
        [INFO] | | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile - omitted for duplicate)
        [INFO] | | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate)
        [INFO] | | +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
        [INFO] | | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
        [INFO] | | | - (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate)
        [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate)
        [INFO] | +- com.twitter:chill_2.10:jar:0.8.0:compile
        [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.5:compile - omitted for conflict with 2.10.6)
        [INFO] | | +- (com.twitter:chill-java:jar:0.8.0:compile - omitted for duplicate)
        [INFO] | | - com.esotericsoftware:kryo-shaded:jar:3.0.3:compile
        [INFO] | | +- com.esotericsoftware:minlog:jar:1.3.0:compile
        [INFO] | | - org.objenesis:objenesis:jar:2.1:compile
        [INFO] | +- com.twitter:chill-java:jar:0.8.0:compile
        [INFO] | | - (com.esotericsoftware:kryo-shaded:jar:3.0.3:compile - omitted for duplicate)
        [INFO] | +- org.apache.xbean:xbean-asm5-shaded:jar:4.4:compile
        [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
        [INFO] | | +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile
        [INFO] | | | +- (org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile - omitted for duplicate)
        [INFO] | | | +- (com.google.guava:guava:jar:11.0.2:compile - omitted for conflict with 17.0)
        [INFO] | | | +- commons-cli:commons-cli:jar:1.2:compile
        [INFO] | | | +- org.apache.commons:commons-math:jar:2.1:compile
        [INFO] | | | +- xmlenc:xmlenc:jar:0.52:compile
        [INFO] | | | +- (commons-httpclient:commons-httpclient:jar:3.1:compile - omitted for duplicate)
        [INFO] | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.10)
        [INFO] | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5)
        [INFO] | | | +- (commons-net:commons-net:jar:3.1:compile - omitted for conflict with 2.2)
        [INFO] | | | +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate)
        [INFO] | | | +- (commons-lang:commons-lang:jar:2.5:compile - omitted for conflict with 2.6)
        [INFO] | | | +- commons-configuration:commons-configuration:jar:1.6:compile
        [INFO] | | | | +- commons-collections:commons-collections:jar:3.2.1:compile
        [INFO] | | | | +- (commons-lang:commons-lang:jar:2.4:compile - omitted for conflict with 2.6)
        [INFO] | | | | +- commons-digester:commons-digester:jar:1.8:compile
        [INFO] | | | | | - commons-beanutils:commons-beanutils:jar:1.7.0:compile
        [INFO] | | | | - commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
        [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:runtime - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile - omitted for conflict with 1.9.13)
        [INFO] | | | +- (org.apache.avro:avro:jar:1.7.4:compile - omitted for conflict with 1.7.7)
        [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
        [INFO] | | | +- org.apache.hadoop:hadoop-auth:jar:2.2.0:compile
        [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.10)
        [INFO] | | | | +- (log4j:log4j:jar:1.2.17:runtime - omitted for duplicate)
        [INFO] | | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:runtime - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate)
        [INFO] | | | - (org.apache.commons:commons-compress:jar:1.4.1:compile - omitted for conflict with 1.14)
        [INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.2.0:compile
        [INFO] | | | +- (com.google.guava:guava:jar:11.0.2:compile - omitted for conflict with 17.0)
        [INFO] | | | +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
        [INFO] | | | +- (commons-cli:commons-cli:jar:1.2:compile - omitted for duplicate)
        [INFO] | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.10)
        [INFO] | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5)
        [INFO] | | | +- (commons-lang:commons-lang:jar:2.5:compile - omitted for conflict with 2.6)
        [INFO] | | | +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate)
        [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
        [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile - omitted for conflict with 1.9.13)
        [INFO] | | | - (xmlenc:xmlenc:jar:0.52:compile - omitted for duplicate)
        [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.2.0:compile
        [INFO] | | | +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.2.0:compile
        [INFO] | | | | +- (org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile - omitted for duplicate)
        [INFO] | | | | +- org.apache.hadoop:hadoop-yarn-client:jar:2.2.0:compile
        [INFO] | | | | | +- (org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile - omitted for duplicate)
        [INFO] | | | | | +- (org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile - omitted for duplicate)
        [INFO] | | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
        [INFO] | | | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5)
        [INFO] | | | | | +- (com.google.inject:guice:jar:3.0:compile - omitted for duplicate)
        [INFO] | | | | | +- (com.sun.jersey.jersey-test-framework:jersey-test-framework-grizzly2:jar:1.9:compile - omitted for duplicate)
        [INFO] | | | | | +- (com.sun.jersey:jersey-server:jar:1.9:compile - omitted for duplicate)
        [INFO] | | | | | +- (com.sun.jersey:jersey-json:jar:1.9:compile - omitted for duplicate)
        [INFO] | | | | | - (com.sun.jersey.contribs:jersey-guice:jar:1.9:compile - omitted for duplicate)
        [INFO] | | | | +- (org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile - omitted for duplicate)
        [INFO] | | | | +- org.apache.hadoop:hadoop-yarn-server-common:jar:2.2.0:compile
        [INFO] | | | | | +- (org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile - omitted for duplicate)
        [INFO] | | | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate)
        [INFO] | | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
        [INFO] | | | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5)
        [INFO] | | | | | +- (com.google.inject:guice:jar:3.0:compile - omitted for duplicate)
        [INFO] | | | | | +- (com.sun.jersey.jersey-test-framework:jersey-test-framework-grizzly2:jar:1.9:compile - omitted for duplicate)
        [INFO] | | | | | +- (com.sun.jersey:jersey-server:jar:1.9:compile - omitted for duplicate)
        [INFO] | | | | | +- (com.sun.jersey:jersey-json:jar:1.9:compile - omitted for duplicate)
        [INFO] | | | | | - (com.sun.jersey.contribs:jersey-guice:jar:1.9:compile - omitted for duplicate)
        [INFO] | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
        [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | +- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.2.0:compile
        [INFO] | | | | +- (org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile - omitted for duplicate)
        [INFO] | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
        [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
        [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | +- org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile
        [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
        [INFO] | | | - (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5)
        [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile
        [INFO] | | | +- org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile
        [INFO] | | | | +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate)
        [INFO] | | | | +- (org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile - omitted for duplicate)
        [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
        [INFO] | | | | - (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5)
        [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
        [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.2.0:compile
        [INFO] | | | +- (org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.2.0:compile - omitted for duplicate)
        [INFO] | | | +- (org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.2.0:compile - omitted for duplicate)
        [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate)
        [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] | | - org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile
        [INFO] | +- org.apache.spark:spark-launcher_2.10:jar:2.0.0:compile
        [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
        [INFO] | +- org.apache.spark:spark-network-common_2.10:jar:2.0.0:compile
        [INFO] | | +- (io.netty:netty-all:jar:4.0.29.Final:compile - omitted for duplicate)
        [INFO] | | +- (com.google.code.findbugs:jsr305:jar:1.3.9:compile - omitted for duplicate)
        [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
        [INFO] | +- org.apache.spark:spark-network-shuffle_2.10:jar:2.0.0:compile
        [INFO] | | +- (org.apache.spark:spark-network-common_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | | +- org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:compile
        [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate)
        [INFO] | | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.6.5:compile
        [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
        [INFO] | +- org.apache.spark:spark-unsafe_2.10:jar:2.0.0:compile
        [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | | +- (com.twitter:chill_2.10:jar:0.8.0:compile - omitted for duplicate)
        [INFO] | | +- (com.google.code.findbugs:jsr305:jar:1.3.9:compile - omitted for duplicate)
        [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
        [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.7.1:compile
        [INFO] | | +- (commons-codec:commons-codec:jar:1.3:compile - omitted for conflict with 1.10)
        [INFO] | | - commons-httpclient:commons-httpclient:jar:3.1:compile
        [INFO] | | - (commons-codec:commons-codec:jar:1.2:compile - omitted for conflict with 1.10)
        [INFO] | +- org.apache.curator:curator-recipes:jar:2.4.0:compile
        [INFO] | | +- org.apache.curator:curator-framework:jar:2.4.0:compile
        [INFO] | | | +- org.apache.curator:curator-client:jar:2.4.0:compile
        [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate)
        [INFO] | | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate)
        [INFO] | | | | - (com.google.guava:guava:jar:14.0.1:compile - omitted for conflict with 17.0)
        [INFO] | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate)
        [INFO] | | | - (com.google.guava:guava:jar:14.0.1:compile - omitted for conflict with 17.0)
        [INFO] | | +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile
        [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.1; omitted for duplicate)
        [INFO] | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.6.1; omitted for duplicate)
        [INFO] | | | - (log4j:log4j:jar:1.2.15:compile - omitted for conflict with 1.2.17)
        [INFO] | | - (com.google.guava:guava:jar:14.0.1:compile - omitted for conflict with 17.0)
        [INFO] | +- javax.servlet:javax.servlet-api:jar:3.1.0:compile
        [INFO] | +- org.apache.commons:commons-lang3:jar:3.3.2:compile
        [INFO] | +- (org.apache.commons:commons-math3:jar:3.4.1:compile - omitted for duplicate)
        [INFO] | +- com.google.code.findbugs:jsr305:jar:1.3.9:compile
        [INFO] | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate)
        [INFO] | +- (org.slf4j:jul-to-slf4j:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate)
        [INFO] | +- (org.slf4j:jcl-over-slf4j:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate)
        [INFO] | +- log4j:log4j:jar:1.2.17:compile
        [INFO] | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate)
        [INFO] | +- com.ning:compress-lzf:jar:1.0.3:compile
        [INFO] | +- org.xerial.snappy:snappy-java:jar:1.1.2.4:compile
        [INFO] | +- net.jpountz.lz4:lz4:jar:1.3.0:compile
        [INFO] | +- org.roaringbitmap:RoaringBitmap:jar:0.5.11:compile
        [INFO] | +- commons-net:commons-net:jar:2.2:compile
        [INFO] | +- org.scala-lang:scala-library:jar:2.10.6:compile
        [INFO] | +- org.json4s:json4s-jackson_2.10:jar:3.2.11:compile
        [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6)
        [INFO] | | +- org.json4s:json4s-core_2.10:jar:3.2.11:compile
        [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6)
        [INFO] | | | +- org.json4s:json4s-ast_2.10:jar:3.2.11:compile
        [INFO] | | | | - (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6)
        [INFO] | | | +- com.thoughtworks.paranamer:paranamer:jar:2.6:compile
        [INFO] | | | - org.scala-lang:scalap:jar:2.10.0:compile
        [INFO] | | | - org.scala-lang:scala-compiler:jar:2.10.0:compile
        [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6)
        [INFO] | | | - (org.scala-lang:scala-reflect:jar:2.10.0:compile - omitted for conflict with 2.10.6)
        [INFO] | | - (com.fasterxml.jackson.core:jackson-databind:jar:2.3.1:compile - omitted for conflict with 2.6.5)
        [INFO] | +- org.glassfish.jersey.core:jersey-client:jar:2.22.2:compile
        [INFO] | | +- (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate)
        [INFO] | | +- org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile
        [INFO] | | | +- org.glassfish.hk2:hk2-utils:jar:2.4.0-b34:compile
        [INFO] | | | - org.glassfish.hk2.external:aopalliance-repackaged:jar:2.4.0-b34:compile
        [INFO] | | +- org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile
        [INFO] | | - org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile
        [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.hk2.external:aopalliance-repackaged:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.hk2:hk2-utils:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | - org.javassist:javassist:jar:3.18.1-GA:compile
        [INFO] | +- org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile
        [INFO] | | +- (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate)
        [INFO] | | +- (javax.annotation:javax.annotation-api:jar:1.2:compile - omitted for duplicate)
        [INFO] | | +- org.glassfish.jersey.bundles.repackaged:jersey-guava:jar:2.22.2:compile
        [INFO] | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | - org.glassfish.hk2:osgi-resource-locator:jar:1.0.1:compile
        [INFO] | +- org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile
        [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.jersey.core:jersey-client:jar:2.22.2:compile - omitted for duplicate)
        [INFO] | | +- (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate)
        [INFO] | | +- org.glassfish.jersey.media:jersey-media-jaxb:jar:2.22.2:compile
        [INFO] | | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate)
        [INFO] | | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | | +- (org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | | - (org.glassfish.hk2:osgi-resource-locator:jar:1.0.1:compile - omitted for duplicate)
        [INFO] | | +- (javax.annotation:javax.annotation-api:jar:1.2:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | - javax.validation:validation-api:jar:1.1.0.Final:compile
        [INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet:jar:2.22.2:compile
        [INFO] | | +- (org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.22.2:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile - omitted for duplicate)
        [INFO] | | - (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate)
        [INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.22.2:compile
        [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate)
        [INFO] | | +- (org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile - omitted for duplicate)
        [INFO] | | - (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate)
        [INFO] | +- org.apache.mesos:mesos:jar:shaded-protobuf:0.21.1:compile
        [INFO] | +- io.netty:netty-all:jar:4.0.29.Final:compile
        [INFO] | +- io.netty:netty:jar:3.8.0.Final:compile
        [INFO] | +- com.clearspring.analytics:stream:jar:2.7.0:compile
        [INFO] | +- io.dropwizard.metrics:metrics-core:jar:3.1.2:compile
        [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate)
        [INFO] | +- io.dropwizard.metrics:metrics-jvm:jar:3.1.2:compile
        [INFO] | | +- (io.dropwizard.metrics:metrics-core:jar:3.1.2:compile - omitted for duplicate)
        [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate)
        [INFO] | +- io.dropwizard.metrics:metrics-json:jar:3.1.2:compile
        [INFO] | | +- (io.dropwizard.metrics:metrics-core:jar:3.1.2:compile - omitted for duplicate)
        [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.4.2:compile - omitted for conflict with 2.6.5)
        [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate)
        [INFO] | +- io.dropwizard.metrics:metrics-graphite:jar:3.1.2:compile
        [INFO] | | +- (io.dropwizard.metrics:metrics-core:jar:3.1.2:compile - omitted for duplicate)
        [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate)
        [INFO] | +- com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile
        [INFO] | | +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.6.0:compile - omitted for conflict with 2.6.5)
        [INFO] | | - (com.fasterxml.jackson.core:jackson-core:jar:2.6.5:compile - omitted for conflict with 2.8.1)
        [INFO] | +- com.fasterxml.jackson.module:jackson-module-scala_2.10:jar:2.6.5:compile
        [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.6:compile - omitted for duplicate)
        [INFO] | | +- org.scala-lang:scala-reflect:jar:2.10.6:compile
        [INFO] | | | - (org.scala-lang:scala-library:jar:2.10.6:compile - omitted for duplicate)
        [INFO] | | +- (com.fasterxml.jackson.core:jackson-core:jar:2.6.5:compile - omitted for conflict with 2.8.1)
        [INFO] | | +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.6.5:compile - omitted for duplicate)
        [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate)
        [INFO] | | - com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.6.5:compile
        [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate)
        [INFO] | | - (com.thoughtworks.paranamer:paranamer:jar:2.6:compile - omitted for duplicate)
        [INFO] | +- org.apache.ivy:ivy:jar:2.4.0:compile
        [INFO] | +- oro:oro:jar:2.0.8:compile
        [INFO] | +- net.razorvine:pyrolite:jar:4.9:compile
        [INFO] | +- net.sf.py4j:py4j:jar:0.10.1:compile
        [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
        [INFO] +- org.apache.spark:spark-streaming_2.10:jar:2.0.0:compile
        [INFO] | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | +- (org.scala-lang:scala-library:jar:2.10.6:compile - omitted for duplicate)
        [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
        [INFO] +- org.apache.spark:spark-sql_2.10:jar:2.0.0:compile
        [INFO] | +- com.univocity:univocity-parsers:jar:2.1.1:compile
        [INFO] | +- org.apache.spark:spark-sketch_2.10:jar:2.0.0:compile
        [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
        [INFO] | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | +- org.apache.spark:spark-catalyst_2.10:jar:2.0.0:compile
        [INFO] | | +- (org.scala-lang:scala-reflect:jar:2.10.6:compile - omitted for duplicate)
        [INFO] | | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | | +- (org.apache.spark:spark-unsafe_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | | +- org.codehaus.janino:janino:jar:2.7.8:compile
        [INFO] | | | - org.codehaus.janino:commons-compiler:jar:2.7.8:compile
        [INFO] | | +- org.antlr:antlr4-runtime:jar:4.5.3:compile
        [INFO] | | +- (commons-codec:commons-codec:jar:1.10:compile - omitted for duplicate)
        [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
        [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | +- org.apache.parquet:parquet-column:jar:1.7.0:compile
        [INFO] | | +- org.apache.parquet:parquet-common:jar:1.7.0:compile
        [INFO] | | +- org.apache.parquet:parquet-encoding:jar:1.7.0:compile
        [INFO] | | | +- (org.apache.parquet:parquet-common:jar:1.7.0:compile - omitted for duplicate)
        [INFO] | | | +- org.apache.parquet:parquet-generator:jar:1.7.0:compile
        [INFO] | | | | - (org.apache.parquet:parquet-common:jar:1.7.0:compile - omitted for duplicate)
        [INFO] | | | - (commons-codec:commons-codec:jar:1.5:compile - omitted for conflict with 1.10)
        [INFO] | | - (commons-codec:commons-codec:jar:1.5:compile - omitted for conflict with 1.10)
        [INFO] | +- org.apache.parquet:parquet-hadoop:jar:1.7.0:compile
        [INFO] | | +- (org.apache.parquet:parquet-column:jar:1.7.0:compile - omitted for duplicate)
        [INFO] | | +- org.apache.parquet:parquet-format:jar:2.3.0-incubating:compile
        [INFO] | | +- org.apache.parquet:parquet-jackson:jar:1.7.0:compile
        [INFO] | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.11:compile - omitted for conflict with 1.9.13)
        [INFO] | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.11:compile - omitted for conflict with 1.9.13)
        [INFO] | | - (org.xerial.snappy:snappy-java:jar:1.1.1.6:compile - omitted for conflict with 1.1.2.4)
        [INFO] | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate)
        [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
        [INFO] +- org.apache.spark:spark-graphx_2.10:jar:2.0.0:compile
        [INFO] | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | +- (org.apache.xbean:xbean-asm5-shaded:jar:4.4:compile - omitted for duplicate)
        [INFO] | +- com.github.fommil.netlib:core:jar:1.1.2:compile
        [INFO] | | - (net.sourceforge.f2j:arpack_combined_all:jar:0.1:compile - omitted for duplicate)
        [INFO] | +- net.sourceforge.f2j:arpack_combined_all:jar:0.1:compile
        [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
        [INFO] +- org.apache.spark:spark-mllib-local_2.10:jar:2.0.0:compile
        [INFO] | +- (org.scalanlp:breeze_2.10:jar:0.11.2:compile - omitted for duplicate)
        [INFO] | +- (org.apache.commons:commons-math3:jar:3.4.1:compile - omitted for duplicate)
        [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate)
        [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
        [INFO] +- org.scalanlp:breeze_2.10:jar:0.11.2:compile
        [INFO] | +- (org.scala-lang:scala-library:jar:2.10.4:compile - omitted for conflict with 2.10.6)
        [INFO] | +- org.scalanlp:breeze-macros_2.10:jar:0.11.2:compile
        [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.4:compile - omitted for conflict with 2.10.6)
        [INFO] | | +- org.scalamacros:quasiquotes_2.10:jar:2.0.0-M8:compile
        [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.3:compile - omitted for conflict with 2.10.6)
        [INFO] | | | - (org.scala-lang:scala-reflect:jar:2.10.3:compile - omitted for conflict with 2.10.6)
        [INFO] | | - (org.scala-lang:scala-reflect:jar:2.10.4:compile - omitted for conflict with 2.10.6)
        [INFO] | +- (com.github.fommil.netlib:core:jar:1.1.2:compile - omitted for duplicate)
        [INFO] | +- (net.sourceforge.f2j:arpack_combined_all:jar:0.1:compile - omitted for duplicate)
        [INFO] | +- net.sf.opencsv:opencsv:jar:2.3:compile
        [INFO] | +- com.github.rwl:jtransforms:jar:2.4.0:compile
        [INFO] | +- org.spire-math:spire_2.10:jar:0.7.4:compile
        [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.2:compile - omitted for conflict with 2.10.6)
        [INFO] | | +- org.spire-math:spire-macros_2.10:jar:0.7.4:compile
        [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.2:compile - omitted for conflict with 2.10.6)
        [INFO] | | | +- (org.scala-lang:scala-reflect:jar:2.10.2:compile - omitted for conflict with 2.10.6)
        [INFO] | | | - (org.scalamacros:quasiquotes_2.10:jar:2.0.0:compile - omitted for conflict with 2.0.0-M8)
        [INFO] | | +- (org.scala-lang:scala-reflect:jar:2.10.2:compile - omitted for conflict with 2.10.6)
        [INFO] | | - (org.scalamacros:quasiquotes_2.10:jar:2.0.0:compile - omitted for conflict with 2.0.0-M8)
        [INFO] | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate)
        [INFO] +- org.apache.commons:commons-math3:jar:3.4.1:compile
        [INFO] +- org.jpmml:pmml-model:jar:1.2.15:compile
        [INFO] | - org.jpmml:pmml-schema:jar:1.2.15:compile
        [INFO] +- org.apache.spark:spark-tags_2.10:jar:2.0.0:compile
        [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate)
        [INFO] - org.spark-project.spark:unused:jar:1.0.0:compile

        Show
        tallison@mitre.org Tim Allison added a comment - 1) Would it be possible to allow for failure to get/find models? Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (default) on project tika-parsers: An Ant BuildException has occured: Warning: Could not find file C:\blah\tika-parsers\src\test\resources\org\apache\tika\parser\ner\opennlp\en-pos-maxent.bin to copy. [ERROR] around Ant part ...<copy file="C:\blah\tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/en-pos-maxent.bin" todir="C:\blahtika-asf2-git-1.x\tika-parsers/model/opennlp/"/>... @ 4:238 in C:\blah\tika-asf2-git-1.x\tika-parsers\target\antrun\build-main.xml [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. 2) Can we put this in a separate module or try to pare down the dependencies? - edu.usc.ir:age-predictor-api:jar:1.0:compile [INFO] - edu.usc.ir:age-predictor-cli:jar:1.0:compile [INFO] +- edu.usc.ir:age-predictor-opennlp:jar:1.0:compile [INFO] | +- (org.apache.opennlp:opennlp-tools:jar:1.6.0:compile - omitted for duplicate) [INFO] | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.12; omitted for duplicate) [INFO] | - (commons-io:commons-io:jar:2.5:compile - omitted for duplicate) [INFO] +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.12; omitted for duplicate) [INFO] +- (commons-io:commons-io:jar:2.5:compile - omitted for duplicate) [INFO] - org.apache.spark:spark-mllib_2.10:jar:2.0.0:compile [INFO] +- org.apache.spark:spark-core_2.10:jar:2.0.0:compile [INFO] | +- org.apache.avro:avro-mapred:jar:hadoop2:1.7.7:compile [INFO] | | +- org.apache.avro:avro-ipc:jar:1.7.7:compile [INFO] | | | +- org.apache.avro:avro:jar:1.7.7:compile [INFO] | | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | | | +- (com.thoughtworks.paranamer:paranamer:jar:2.3:compile - omitted for conflict with 2.6) [INFO] | | | | +- (org.xerial.snappy:snappy-java:jar:1.0.5:compile - omitted for conflict with 1.1.2.4) [INFO] | | | | +- (org.apache.commons:commons-compress:jar:1.4.1:compile - omitted for conflict with 1.14) [INFO] | | | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate) [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate) [INFO] | | +- org.apache.avro:avro-ipc:jar:tests:1.7.7:compile [INFO] | | | +- (org.apache.avro:avro:jar:1.7.7:compile - omitted for duplicate) [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate) [INFO] | | +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile [INFO] | | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile [INFO] | | | - (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate) [INFO] | +- com.twitter:chill_2.10:jar:0.8.0:compile [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.5:compile - omitted for conflict with 2.10.6) [INFO] | | +- (com.twitter:chill-java:jar:0.8.0:compile - omitted for duplicate) [INFO] | | - com.esotericsoftware:kryo-shaded:jar:3.0.3:compile [INFO] | | +- com.esotericsoftware:minlog:jar:1.3.0:compile [INFO] | | - org.objenesis:objenesis:jar:2.1:compile [INFO] | +- com.twitter:chill-java:jar:0.8.0:compile [INFO] | | - (com.esotericsoftware:kryo-shaded:jar:3.0.3:compile - omitted for duplicate) [INFO] | +- org.apache.xbean:xbean-asm5-shaded:jar:4.4:compile [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile [INFO] | | | +- (org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | +- (com.google.guava:guava:jar:11.0.2:compile - omitted for conflict with 17.0) [INFO] | | | +- commons-cli:commons-cli:jar:1.2:compile [INFO] | | | +- org.apache.commons:commons-math:jar:2.1:compile [INFO] | | | +- xmlenc:xmlenc:jar:0.52:compile [INFO] | | | +- (commons-httpclient:commons-httpclient:jar:3.1:compile - omitted for duplicate) [INFO] | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.10) [INFO] | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5) [INFO] | | | +- (commons-net:commons-net:jar:3.1:compile - omitted for conflict with 2.2) [INFO] | | | +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate) [INFO] | | | +- (commons-lang:commons-lang:jar:2.5:compile - omitted for conflict with 2.6) [INFO] | | | +- commons-configuration:commons-configuration:jar:1.6:compile [INFO] | | | | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | | | | +- (commons-lang:commons-lang:jar:2.4:compile - omitted for conflict with 2.6) [INFO] | | | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | | | | - commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | | | - commons-beanutils:commons-beanutils-core:jar:1.8.0:compile [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:runtime - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile - omitted for conflict with 1.9.13) [INFO] | | | +- (org.apache.avro:avro:jar:1.7.4:compile - omitted for conflict with 1.7.7) [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | +- org.apache.hadoop:hadoop-auth:jar:2.2.0:compile [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.10) [INFO] | | | | +- (log4j:log4j:jar:1.2.17:runtime - omitted for duplicate) [INFO] | | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:runtime - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate) [INFO] | | | - (org.apache.commons:commons-compress:jar:1.4.1:compile - omitted for conflict with 1.14) [INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.2.0:compile [INFO] | | | +- (com.google.guava:guava:jar:11.0.2:compile - omitted for conflict with 17.0) [INFO] | | | +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile [INFO] | | | +- (commons-cli:commons-cli:jar:1.2:compile - omitted for duplicate) [INFO] | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.10) [INFO] | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5) [INFO] | | | +- (commons-lang:commons-lang:jar:2.5:compile - omitted for conflict with 2.6) [INFO] | | | +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate) [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile - omitted for conflict with 1.9.13) [INFO] | | | - (xmlenc:xmlenc:jar:0.52:compile - omitted for duplicate) [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.2.0:compile [INFO] | | | +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.2.0:compile [INFO] | | | | +- (org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | +- org.apache.hadoop:hadoop-yarn-client:jar:2.2.0:compile [INFO] | | | | | +- (org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | | +- (org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5) [INFO] | | | | | +- (com.google.inject:guice:jar:3.0:compile - omitted for duplicate) [INFO] | | | | | +- (com.sun.jersey.jersey-test-framework:jersey-test-framework-grizzly2:jar:1.9:compile - omitted for duplicate) [INFO] | | | | | +- (com.sun.jersey:jersey-server:jar:1.9:compile - omitted for duplicate) [INFO] | | | | | +- (com.sun.jersey:jersey-json:jar:1.9:compile - omitted for duplicate) [INFO] | | | | | - (com.sun.jersey.contribs:jersey-guice:jar:1.9:compile - omitted for duplicate) [INFO] | | | | +- (org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | +- org.apache.hadoop:hadoop-yarn-server-common:jar:2.2.0:compile [INFO] | | | | | +- (org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate) [INFO] | | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | | | +- (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5) [INFO] | | | | | +- (com.google.inject:guice:jar:3.0:compile - omitted for duplicate) [INFO] | | | | | +- (com.sun.jersey.jersey-test-framework:jersey-test-framework-grizzly2:jar:1.9:compile - omitted for duplicate) [INFO] | | | | | +- (com.sun.jersey:jersey-server:jar:1.9:compile - omitted for duplicate) [INFO] | | | | | +- (com.sun.jersey:jersey-json:jar:1.9:compile - omitted for duplicate) [INFO] | | | | | - (com.sun.jersey.contribs:jersey-guice:jar:1.9:compile - omitted for duplicate) [INFO] | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.2.0:compile [INFO] | | | | +- (org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | +- org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | - (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5) [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile [INFO] | | | +- org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile [INFO] | | | | +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate) [INFO] | | | | +- (org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | | - (commons-io:commons-io:jar:2.1:compile - omitted for conflict with 2.5) [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.2.0:compile [INFO] | | | +- (org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | +- (org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.2.0:compile - omitted for duplicate) [INFO] | | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | | - (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] | | - org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile [INFO] | +- org.apache.spark:spark-launcher_2.10:jar:2.0.0:compile [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] | +- org.apache.spark:spark-network-common_2.10:jar:2.0.0:compile [INFO] | | +- (io.netty:netty-all:jar:4.0.29.Final:compile - omitted for duplicate) [INFO] | | +- (com.google.code.findbugs:jsr305:jar:1.3.9:compile - omitted for duplicate) [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] | +- org.apache.spark:spark-network-shuffle_2.10:jar:2.0.0:compile [INFO] | | +- (org.apache.spark:spark-network-common_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | +- org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:compile [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate) [INFO] | | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.6.5:compile [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] | +- org.apache.spark:spark-unsafe_2.10:jar:2.0.0:compile [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | +- (com.twitter:chill_2.10:jar:0.8.0:compile - omitted for duplicate) [INFO] | | +- (com.google.code.findbugs:jsr305:jar:1.3.9:compile - omitted for duplicate) [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.7.1:compile [INFO] | | +- (commons-codec:commons-codec:jar:1.3:compile - omitted for conflict with 1.10) [INFO] | | - commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | | - (commons-codec:commons-codec:jar:1.2:compile - omitted for conflict with 1.10) [INFO] | +- org.apache.curator:curator-recipes:jar:2.4.0:compile [INFO] | | +- org.apache.curator:curator-framework:jar:2.4.0:compile [INFO] | | | +- org.apache.curator:curator-client:jar:2.4.0:compile [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.4; omitted for duplicate) [INFO] | | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate) [INFO] | | | | - (com.google.guava:guava:jar:14.0.1:compile - omitted for conflict with 17.0) [INFO] | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:compile - omitted for duplicate) [INFO] | | | - (com.google.guava:guava:jar:14.0.1:compile - omitted for conflict with 17.0) [INFO] | | +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.6.1; omitted for duplicate) [INFO] | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.6.1; omitted for duplicate) [INFO] | | | - (log4j:log4j:jar:1.2.15:compile - omitted for conflict with 1.2.17) [INFO] | | - (com.google.guava:guava:jar:14.0.1:compile - omitted for conflict with 17.0) [INFO] | +- javax.servlet:javax.servlet-api:jar:3.1.0:compile [INFO] | +- org.apache.commons:commons-lang3:jar:3.3.2:compile [INFO] | +- (org.apache.commons:commons-math3:jar:3.4.1:compile - omitted for duplicate) [INFO] | +- com.google.code.findbugs:jsr305:jar:1.3.9:compile [INFO] | +- (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate) [INFO] | +- (org.slf4j:jul-to-slf4j:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate) [INFO] | +- (org.slf4j:jcl-over-slf4j:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate) [INFO] | +- log4j:log4j:jar:1.2.17:compile [INFO] | +- (org.slf4j:slf4j-log4j12:jar:1.7.24:compile - version managed from 1.7.16; omitted for duplicate) [INFO] | +- com.ning:compress-lzf:jar:1.0.3:compile [INFO] | +- org.xerial.snappy:snappy-java:jar:1.1.2.4:compile [INFO] | +- net.jpountz.lz4:lz4:jar:1.3.0:compile [INFO] | +- org.roaringbitmap:RoaringBitmap:jar:0.5.11:compile [INFO] | +- commons-net:commons-net:jar:2.2:compile [INFO] | +- org.scala-lang:scala-library:jar:2.10.6:compile [INFO] | +- org.json4s:json4s-jackson_2.10:jar:3.2.11:compile [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6) [INFO] | | +- org.json4s:json4s-core_2.10:jar:3.2.11:compile [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6) [INFO] | | | +- org.json4s:json4s-ast_2.10:jar:3.2.11:compile [INFO] | | | | - (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6) [INFO] | | | +- com.thoughtworks.paranamer:paranamer:jar:2.6:compile [INFO] | | | - org.scala-lang:scalap:jar:2.10.0:compile [INFO] | | | - org.scala-lang:scala-compiler:jar:2.10.0:compile [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.0:compile - omitted for conflict with 2.10.6) [INFO] | | | - (org.scala-lang:scala-reflect:jar:2.10.0:compile - omitted for conflict with 2.10.6) [INFO] | | - (com.fasterxml.jackson.core:jackson-databind:jar:2.3.1:compile - omitted for conflict with 2.6.5) [INFO] | +- org.glassfish.jersey.core:jersey-client:jar:2.22.2:compile [INFO] | | +- (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate) [INFO] | | +- org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile [INFO] | | | +- org.glassfish.hk2:hk2-utils:jar:2.4.0-b34:compile [INFO] | | | - org.glassfish.hk2.external:aopalliance-repackaged:jar:2.4.0-b34:compile [INFO] | | +- org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile [INFO] | | - org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2.external:aopalliance-repackaged:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2:hk2-utils:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | - org.javassist:javassist:jar:3.18.1-GA:compile [INFO] | +- org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile [INFO] | | +- (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate) [INFO] | | +- (javax.annotation:javax.annotation-api:jar:1.2:compile - omitted for duplicate) [INFO] | | +- org.glassfish.jersey.bundles.repackaged:jersey-guava:jar:2.22.2:compile [INFO] | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | - org.glassfish.hk2:osgi-resource-locator:jar:1.0.1:compile [INFO] | +- org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.jersey.core:jersey-client:jar:2.22.2:compile - omitted for duplicate) [INFO] | | +- (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate) [INFO] | | +- org.glassfish.jersey.media:jersey-media-jaxb:jar:2.22.2:compile [INFO] | | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate) [INFO] | | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | | +- (org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | | - (org.glassfish.hk2:osgi-resource-locator:jar:1.0.1:compile - omitted for duplicate) [INFO] | | +- (javax.annotation:javax.annotation-api:jar:1.2:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | - javax.validation:validation-api:jar:1.1.0.Final:compile [INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet:jar:2.22.2:compile [INFO] | | +- (org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.22.2:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile - omitted for duplicate) [INFO] | | - (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate) [INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.22.2:compile [INFO] | | +- (org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile - omitted for duplicate) [INFO] | | +- (org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile - omitted for duplicate) [INFO] | | - (javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile - omitted for duplicate) [INFO] | +- org.apache.mesos:mesos:jar:shaded-protobuf:0.21.1:compile [INFO] | +- io.netty:netty-all:jar:4.0.29.Final:compile [INFO] | +- io.netty:netty:jar:3.8.0.Final:compile [INFO] | +- com.clearspring.analytics:stream:jar:2.7.0:compile [INFO] | +- io.dropwizard.metrics:metrics-core:jar:3.1.2:compile [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate) [INFO] | +- io.dropwizard.metrics:metrics-jvm:jar:3.1.2:compile [INFO] | | +- (io.dropwizard.metrics:metrics-core:jar:3.1.2:compile - omitted for duplicate) [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate) [INFO] | +- io.dropwizard.metrics:metrics-json:jar:3.1.2:compile [INFO] | | +- (io.dropwizard.metrics:metrics-core:jar:3.1.2:compile - omitted for duplicate) [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.4.2:compile - omitted for conflict with 2.6.5) [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate) [INFO] | +- io.dropwizard.metrics:metrics-graphite:jar:3.1.2:compile [INFO] | | +- (io.dropwizard.metrics:metrics-core:jar:3.1.2:compile - omitted for duplicate) [INFO] | | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.7; omitted for duplicate) [INFO] | +- com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile [INFO] | | +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.6.0:compile - omitted for conflict with 2.6.5) [INFO] | | - (com.fasterxml.jackson.core:jackson-core:jar:2.6.5:compile - omitted for conflict with 2.8.1) [INFO] | +- com.fasterxml.jackson.module:jackson-module-scala_2.10:jar:2.6.5:compile [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.6:compile - omitted for duplicate) [INFO] | | +- org.scala-lang:scala-reflect:jar:2.10.6:compile [INFO] | | | - (org.scala-lang:scala-library:jar:2.10.6:compile - omitted for duplicate) [INFO] | | +- (com.fasterxml.jackson.core:jackson-core:jar:2.6.5:compile - omitted for conflict with 2.8.1) [INFO] | | +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.6.5:compile - omitted for duplicate) [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate) [INFO] | | - com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.6.5:compile [INFO] | | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate) [INFO] | | - (com.thoughtworks.paranamer:paranamer:jar:2.6:compile - omitted for duplicate) [INFO] | +- org.apache.ivy:ivy:jar:2.4.0:compile [INFO] | +- oro:oro:jar:2.0.8:compile [INFO] | +- net.razorvine:pyrolite:jar:4.9:compile [INFO] | +- net.sf.py4j:py4j:jar:0.10.1:compile [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] +- org.apache.spark:spark-streaming_2.10:jar:2.0.0:compile [INFO] | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | +- (org.scala-lang:scala-library:jar:2.10.6:compile - omitted for duplicate) [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] +- org.apache.spark:spark-sql_2.10:jar:2.0.0:compile [INFO] | +- com.univocity:univocity-parsers:jar:2.1.1:compile [INFO] | +- org.apache.spark:spark-sketch_2.10:jar:2.0.0:compile [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | +- org.apache.spark:spark-catalyst_2.10:jar:2.0.0:compile [INFO] | | +- (org.scala-lang:scala-reflect:jar:2.10.6:compile - omitted for duplicate) [INFO] | | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | +- (org.apache.spark:spark-unsafe_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | | +- org.codehaus.janino:janino:jar:2.7.8:compile [INFO] | | | - org.codehaus.janino:commons-compiler:jar:2.7.8:compile [INFO] | | +- org.antlr:antlr4-runtime:jar:4.5.3:compile [INFO] | | +- (commons-codec:commons-codec:jar:1.10:compile - omitted for duplicate) [INFO] | | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | +- org.apache.parquet:parquet-column:jar:1.7.0:compile [INFO] | | +- org.apache.parquet:parquet-common:jar:1.7.0:compile [INFO] | | +- org.apache.parquet:parquet-encoding:jar:1.7.0:compile [INFO] | | | +- (org.apache.parquet:parquet-common:jar:1.7.0:compile - omitted for duplicate) [INFO] | | | +- org.apache.parquet:parquet-generator:jar:1.7.0:compile [INFO] | | | | - (org.apache.parquet:parquet-common:jar:1.7.0:compile - omitted for duplicate) [INFO] | | | - (commons-codec:commons-codec:jar:1.5:compile - omitted for conflict with 1.10) [INFO] | | - (commons-codec:commons-codec:jar:1.5:compile - omitted for conflict with 1.10) [INFO] | +- org.apache.parquet:parquet-hadoop:jar:1.7.0:compile [INFO] | | +- (org.apache.parquet:parquet-column:jar:1.7.0:compile - omitted for duplicate) [INFO] | | +- org.apache.parquet:parquet-format:jar:2.3.0-incubating:compile [INFO] | | +- org.apache.parquet:parquet-jackson:jar:1.7.0:compile [INFO] | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.11:compile - omitted for conflict with 1.9.13) [INFO] | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.11:compile - omitted for conflict with 1.9.13) [INFO] | | - (org.xerial.snappy:snappy-java:jar:1.1.1.6:compile - omitted for conflict with 1.1.2.4) [INFO] | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile - omitted for duplicate) [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] +- org.apache.spark:spark-graphx_2.10:jar:2.0.0:compile [INFO] | +- (org.apache.spark:spark-core_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | +- (org.apache.xbean:xbean-asm5-shaded:jar:4.4:compile - omitted for duplicate) [INFO] | +- com.github.fommil.netlib:core:jar:1.1.2:compile [INFO] | | - (net.sourceforge.f2j:arpack_combined_all:jar:0.1:compile - omitted for duplicate) [INFO] | +- net.sourceforge.f2j:arpack_combined_all:jar:0.1:compile [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] +- org.apache.spark:spark-mllib-local_2.10:jar:2.0.0:compile [INFO] | +- (org.scalanlp:breeze_2.10:jar:0.11.2:compile - omitted for duplicate) [INFO] | +- (org.apache.commons:commons-math3:jar:3.4.1:compile - omitted for duplicate) [INFO] | +- (org.apache.spark:spark-tags_2.10:jar:2.0.0:compile - omitted for duplicate) [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] +- org.scalanlp:breeze_2.10:jar:0.11.2:compile [INFO] | +- (org.scala-lang:scala-library:jar:2.10.4:compile - omitted for conflict with 2.10.6) [INFO] | +- org.scalanlp:breeze-macros_2.10:jar:0.11.2:compile [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.4:compile - omitted for conflict with 2.10.6) [INFO] | | +- org.scalamacros:quasiquotes_2.10:jar:2.0.0-M8:compile [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.3:compile - omitted for conflict with 2.10.6) [INFO] | | | - (org.scala-lang:scala-reflect:jar:2.10.3:compile - omitted for conflict with 2.10.6) [INFO] | | - (org.scala-lang:scala-reflect:jar:2.10.4:compile - omitted for conflict with 2.10.6) [INFO] | +- (com.github.fommil.netlib:core:jar:1.1.2:compile - omitted for duplicate) [INFO] | +- (net.sourceforge.f2j:arpack_combined_all:jar:0.1:compile - omitted for duplicate) [INFO] | +- net.sf.opencsv:opencsv:jar:2.3:compile [INFO] | +- com.github.rwl:jtransforms:jar:2.4.0:compile [INFO] | +- org.spire-math:spire_2.10:jar:0.7.4:compile [INFO] | | +- (org.scala-lang:scala-library:jar:2.10.2:compile - omitted for conflict with 2.10.6) [INFO] | | +- org.spire-math:spire-macros_2.10:jar:0.7.4:compile [INFO] | | | +- (org.scala-lang:scala-library:jar:2.10.2:compile - omitted for conflict with 2.10.6) [INFO] | | | +- (org.scala-lang:scala-reflect:jar:2.10.2:compile - omitted for conflict with 2.10.6) [INFO] | | | - (org.scalamacros:quasiquotes_2.10:jar:2.0.0:compile - omitted for conflict with 2.0.0-M8) [INFO] | | +- (org.scala-lang:scala-reflect:jar:2.10.2:compile - omitted for conflict with 2.10.6) [INFO] | | - (org.scalamacros:quasiquotes_2.10:jar:2.0.0:compile - omitted for conflict with 2.0.0-M8) [INFO] | - (org.slf4j:slf4j-api:jar:1.7.24:compile - version managed from 1.7.5; omitted for duplicate) [INFO] +- org.apache.commons:commons-math3:jar:3.4.1:compile [INFO] +- org.jpmml:pmml-model:jar:1.2.15:compile [INFO] | - org.jpmml:pmml-schema:jar:1.2.15:compile [INFO] +- org.apache.spark:spark-tags_2.10:jar:2.0.0:compile [INFO] | - (org.spark-project.spark:unused:jar:1.0.0:compile - omitted for duplicate) [INFO] - org.spark-project.spark:unused:jar:1.0.0:compile
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        #1 - absolutely - i thought putting the model download in Thamme's ModelGetter.groovy script would ensure that even in Proxy environments that models were available. Tim why weren't the models available for you?

        #2 - sure jiminey Christmas - wow that's a lot of dependencies. What do you think about tika-nlp, with this as the first entry?

        Show
        chrismattmann Chris A. Mattmann added a comment - #1 - absolutely - i thought putting the model download in Thamme's ModelGetter.groovy script would ensure that even in Proxy environments that models were available. Tim why weren't the models available for you? #2 - sure jiminey Christmas - wow that's a lot of dependencies. What do you think about tika-nlp, with this as the first entry?
        Hide
        tallison@mitre.org Tim Allison added a comment - - edited

        1. No idea.

        2. Yes, rather. Tika-app ballooned to 181MB. Sounds good.

        Show
        tallison@mitre.org Tim Allison added a comment - - edited 1. No idea. 2. Yes, rather. Tika-app ballooned to 181MB. Sounds good.
        Hide
        tallison@mitre.org Tim Allison added a comment -

        3. At some point we should follow Konstantin Gribov's fantastic TIKA-2245 work and slf4j-ize logging...

        but that isn't as critical as 2.

        Show
        tallison@mitre.org Tim Allison added a comment - 3. At some point we should follow Konstantin Gribov 's fantastic TIKA-2245 work and slf4j-ize logging... but that isn't as critical as 2.
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        Agree on #3. I'm going to take a first cut at tika-nlp. In the future when we unify our recognisers for Object/Text, we should think about moving the NER stuff from tika-parsers into tika-nlp. I'm not going to bother now, b/c it would create a situation where people previously had tika-app support NER, but in the future they would have to include tika-nlp.

        The other thing I think we should seriously consider - that tika-app's size ballooned as you put it - who cares? what if I'll gladly take a 181MB jar file if it gives me capability A, B, C, D all in a box? Two thoughts there. First is that we stop worrying about keeping tika-app so small. Pros: easy, doesn't require anything special; Cons: Size aficionados will be disappointed Second, we could make a tika-app-full module and tika-server-full that is tika-app, plus tika-dl and tika-nlp. Thoughts there?

        Show
        chrismattmann Chris A. Mattmann added a comment - Agree on #3. I'm going to take a first cut at tika-nlp. In the future when we unify our recognisers for Object/Text, we should think about moving the NER stuff from tika-parsers into tika-nlp. I'm not going to bother now, b/c it would create a situation where people previously had tika-app support NER, but in the future they would have to include tika-nlp. The other thing I think we should seriously consider - that tika-app's size ballooned as you put it - who cares? what if I'll gladly take a 181MB jar file if it gives me capability A, B, C, D all in a box? Two thoughts there. First is that we stop worrying about keeping tika-app so small. Pros: easy, doesn't require anything special; Cons: Size aficionados will be disappointed Second, we could make a tika-app-full module and tika-server-full that is tika-app, plus tika-dl and tika-nlp. Thoughts there?
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Thought: lower expectations for 2.0 (put off parser compos-ability and arbitrary metadata) and release pretty much as is (once we catch it up to trunk) at the end of the month.

        Show
        tallison@mitre.org Tim Allison added a comment - Thought: lower expectations for 2.0 (put off parser compos-ability and arbitrary metadata) and release pretty much as is (once we catch it up to trunk) at the end of the month.
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        Sounds good to me...almost done with tika-nlp will commit shortly.

        Show
        chrismattmann Chris A. Mattmann added a comment - Sounds good to me...almost done with tika-nlp will commit shortly.
        Hide
        hudson Hudson added a comment -

        ABORTED: Integrated in Jenkins build Tika-trunk #1320 (See https://builds.apache.org/job/Tika-trunk/1320/)
        TIKA-1988 – allow for failure to copy age recognition models (tallison: https://github.com/apache/tika/commit/58a602f7c9e4a5666a33726767741be73e10cd09)

        Show
        hudson Hudson added a comment - ABORTED: Integrated in Jenkins build Tika-trunk #1320 (See https://builds.apache.org/job/Tika-trunk/1320/ ) TIKA-1988 – allow for failure to copy age recognition models (tallison: https://github.com/apache/tika/commit/58a602f7c9e4a5666a33726767741be73e10cd09 ) (edit) tika-parsers/pom.xml TIKA-1988 – allow for errors downloading models (tallison: https://github.com/apache/tika/commit/632f52db4713977aa93504517e57b8afe86e6e91 ) (edit) tika-parsers/src/main/java/org/apache/tika/parser/recognition/AgeRecogniserConfig.java
        Hide
        hudson Hudson added a comment -

        ABORTED: Integrated in Jenkins build Tika-trunk #1321 (See https://builds.apache.org/job/Tika-trunk/1321/)

        • (delete) tika-parsers/src/main/java/org/apache/tika/parser/recognition/AgeRecogniserConfig.java
        • (add) tika-nlp/src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml
        • (edit) tika-parsers/pom.xml
        • (delete) tika-parsers/src/main/java/org/apache/tika/parser/recognition/AgeRecogniser.java
        • (delete) tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml
        • (add) tika-nlp/src/test/java/org/apache/tika/parser/recognition/AgeRecogniserTest.java
        • (add) tika-nlp/src/main/java/org/apache/tika/parser/recognition/AgeRecogniserConfig.java
        • (delete) tika-parsers/src/test/java/org/apache/tika/parser/recognition/AgeRecogniserTest.java
        • (add) tika-nlp/pom.xml
        • (add) tika-nlp/src/main/java/org/apache/tika/parser/recognition/AgeRecogniser.java
        Show
        hudson Hudson added a comment - ABORTED: Integrated in Jenkins build Tika-trunk #1321 (See https://builds.apache.org/job/Tika-trunk/1321/ ) add Tika-NLP module - move AgeRecogniser out of tika-parsers TIKA-1988 (mattmann: https://github.com/apache/tika/commit/e07d9e1de077c2f332094ce5125d1f4cf779d80d ) (delete) tika-parsers/src/main/java/org/apache/tika/parser/recognition/AgeRecogniserConfig.java (add) tika-nlp/src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml (edit) tika-parsers/pom.xml (delete) tika-parsers/src/main/java/org/apache/tika/parser/recognition/AgeRecogniser.java (delete) tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-age.xml (add) tika-nlp/src/test/java/org/apache/tika/parser/recognition/AgeRecogniserTest.java (add) tika-nlp/src/main/java/org/apache/tika/parser/recognition/AgeRecogniserConfig.java (delete) tika-parsers/src/test/java/org/apache/tika/parser/recognition/AgeRecogniserTest.java (add) tika-nlp/pom.xml (add) tika-nlp/src/main/java/org/apache/tika/parser/recognition/AgeRecogniser.java
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Chris A. Mattmann, to confirm, you want the "model" directory at the same level as src?

        tika-nlp/
            model/
                opennlp/
                    en-pos...bin
                    en-sent...bin
                    en-token...bin
                org/
                        apache/
                                 ...
            src/
                main/
                       ...
        
        Show
        tallison@mitre.org Tim Allison added a comment - Chris A. Mattmann , to confirm, you want the "model" directory at the same level as src? tika-nlp/ model/ opennlp/ en-pos...bin en-sent...bin en-token...bin org/ apache/ ... src/ main/ ...
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        For now yes Tim Allison until we fix https://github.com/USCDataScience/AgePredictor/issues/11 in a 1.1 release later.

        Show
        chrismattmann Chris A. Mattmann added a comment - For now yes Tim Allison until we fix https://github.com/USCDataScience/AgePredictor/issues/11 in a 1.1 release later.
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Thank you!

        Tim why weren't the models available for you?

        They weren't available because ModelGetter is triggered when one of the model files isn't there. In earlier builds, it was successfully pulled. When I deleted the earlier models, ModelGetter was triggered and all model files were successfully downloaded.

        Show
        tallison@mitre.org Tim Allison added a comment - Thank you! Tim why weren't the models available for you? They weren't available because ModelGetter is triggered when one of the model files isn't there. In earlier builds, it was successfully pulled. When I deleted the earlier models, ModelGetter was triggered and all model files were successfully downloaded.
        Hide
        msharan@usc.edu Madhav Sharan added a comment -

        I faced the same issue as Tim earlier. What do you guys think about using maven plugin for downloading models over our own script?

        https://github.com/maven-download-plugin/maven-download-plugin

        I checked and it seems to work with proxies too if that's the only issue. https://github.com/maven-download-plugin/maven-download-plugin/issues/74

        I think it could fit better with no custom code, Open to discussions though

        Show
        msharan@usc.edu Madhav Sharan added a comment - I faced the same issue as Tim earlier. What do you guys think about using maven plugin for downloading models over our own script? https://github.com/maven-download-plugin/maven-download-plugin I checked and it seems to work with proxies too if that's the only issue. https://github.com/maven-download-plugin/maven-download-plugin/issues/74 I think it could fit better with no custom code, Open to discussions though
        Hide
        grossws Konstantin Gribov added a comment -

        Tim Allison, my effort on migrating 2.x to slf4j suspended because I lack spare time for it. I hope to continue it next month but still not sure if something changes. Of course it shouldn't prevent releasing 2.0 because it's mostly internal changes with slight downstream project's dependencies modifications.

        Show
        grossws Konstantin Gribov added a comment - Tim Allison , my effort on migrating 2.x to slf4j suspended because I lack spare time for it. I hope to continue it next month but still not sure if something changes. Of course it shouldn't prevent releasing 2.0 because it's mostly internal changes with slight downstream project's dependencies modifications.

          People

          • Assignee:
            chrismattmann Chris A. Mattmann
            Reporter:
            msharan@usc.edu Madhav Sharan
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development