Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1925

Composite External Parser like Exiftool fails to run on Windows.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.12
    • Fix Version/s: 1.14
    • Component/s: core
    • Labels:
      None
    • Environment:

      Windows 10, Intel i7 6550U 64-Bit processor

      Description

      While trying to run EXIFTool Parser using Tika on Windows OS, we are getting following error output.
      (Ref: http://wiki.apache.org/tika/EXIFToolParser)

      java.io.IOException: Cannot run program "env": CreateProcess error=2, The system cannot find the file specified
      at java.lang.ProcessBuilder.start(Unknown Source)
      at java.lang.Runtime.exec(Unknown Source)
      at java.lang.Runtime.exec(Unknown Source)
      at org.apache.tika.parser.external.ExternalParser.parse(ExternalParser.java:182)
      at org.apache.tika.parser.external.ExternalParser.parse(ExternalParser.java:145)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:177)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:190)
      at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:491)
      at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:144)
      Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
      at java.lang.ProcessImpl.create(Native Method)
      at java.lang.ProcessImpl.<init>(Unknown Source)
      at java.lang.ProcessImpl.start(Unknown Source)
      ... 13 more
      Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.external.ExternalParser@51efea79
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
      at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:177)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:190)
      at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:491)
      at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:144)
      Caused by: java.lang.NullPointerException
      at org.apache.tika.parser.external.ExternalParser.parse(ExternalParser.java:218)
      at org.apache.tika.parser.external.ExternalParser.parse(ExternalParser.java:145)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      ... 7 more

      After analyzing the stack trace and little experimentation, we found that "env" is unix/Mac OS X/Linux specific command and does not work on Windows.

      We were able to workaround this problem by adding some Windows specific code, recompile Tika and run again with similar setup. I am attaching the original file and modified file for review.

      If fix is acceptable by Tika specific standards, I can send the pull request on Github to contribute the patch.

      1. ExternalParser_modified.java
        14 kB
        Nilay Chheda
      2. ExternalParser_orig.java
        13 kB
        Nilay Chheda

        Issue Links

          Activity

          Hide
          mit2nil Nilay Chheda added a comment -

          Original file taken from latest code from Github

          Show
          mit2nil Nilay Chheda added a comment - Original file taken from latest code from Github
          Hide
          mit2nil Nilay Chheda added a comment -

          Changes made to handle windows specific changes keeping non-windows code execution in tact.

          Show
          mit2nil Nilay Chheda added a comment - Changes made to handle windows specific changes keeping non-windows code execution in tact.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user mit2nil opened a pull request:

          https://github.com/apache/tika/pull/96

          fix for TIKA-1925 contributed by Nilay Chheda

          @chrismattmann Please review the change and let me know they can be contributed back to Tika.
          Issue description: https://issues.apache.org/jira/browse/TIKA-1925(https://issues.apache.org/jira/browse/TIKA-1925)

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/mit2nil/tika TIKA-1925

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tika/pull/96.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #96


          commit c6e2b028beed78e66a80e7e22bc5d9f74b240dbe
          Author: mit2nil <mit2nil@gmail.com>
          Date: 2016-04-02T02:11:08Z

          fix for TIKA-1925 contributed by Nilay Chheda


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user mit2nil opened a pull request: https://github.com/apache/tika/pull/96 fix for TIKA-1925 contributed by Nilay Chheda @chrismattmann Please review the change and let me know they can be contributed back to Tika. Issue description: https://issues.apache.org/jira/browse/TIKA-1925 ( https://issues.apache.org/jira/browse/TIKA-1925 ) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mit2nil/tika TIKA-1925 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tika/pull/96.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #96 commit c6e2b028beed78e66a80e7e22bc5d9f74b240dbe Author: mit2nil <mit2nil@gmail.com> Date: 2016-04-02T02:11:08Z fix for TIKA-1925 contributed by Nilay Chheda
          Hide
          gagravarr Nick Burch added a comment -

          Your patch seems to be making some very large changes, and at first glance I worry you might be tackling a symptom not the cause...

          The exiftool check doesn't need env, so I'm not sure why we're needing it to run the tool? And if we do, to avoid path issues, maybe we should be introducing a special variable / substitution to enable it on unixes?

          Could you perhaps check the source control history and associated jiras for the change that introduced this, and see if we can find an alternate route to solve this based on why we introduced it to start with?

          (Well, or just wait for the announced ubuntu/bash on Windows to come out, and use that!)

          Show
          gagravarr Nick Burch added a comment - Your patch seems to be making some very large changes, and at first glance I worry you might be tackling a symptom not the cause... The exiftool check doesn't need env, so I'm not sure why we're needing it to run the tool? And if we do, to avoid path issues, maybe we should be introducing a special variable / substitution to enable it on unixes? Could you perhaps check the source control history and associated jiras for the change that introduced this, and see if we can find an alternate route to solve this based on why we introduced it to start with? (Well, or just wait for the announced ubuntu/bash on Windows to come out, and use that!)
          Hide
          mit2nil Nilay Chheda added a comment -

          Yes I agree with you on finding the origin of the problem. In fact, my first approach was to find, hard-coding of "env" command in the Tika source. But unfortunately I couldn't. So I had to work around for getting my assignment done for Prof. Mattman's class at USC.

          As far as ubuntu/bash is concerned, it is going to be available only for latest windows 10 builds (thanks to a new subsystem quietly placed into Windows 10 build 14251 back in January. The lxcore.sys and lxss.sys files form the new “Windows Subsystem for Linux (WSL)). Why do we want to restrict Tika to select few environment when it is written in Java to be platform agnostic.

          I would give it another try and update if I can get a cleaner patch.

          Show
          mit2nil Nilay Chheda added a comment - Yes I agree with you on finding the origin of the problem. In fact, my first approach was to find, hard-coding of "env" command in the Tika source. But unfortunately I couldn't. So I had to work around for getting my assignment done for Prof. Mattman's class at USC. As far as ubuntu/bash is concerned, it is going to be available only for latest windows 10 builds (thanks to a new subsystem quietly placed into Windows 10 build 14251 back in January. The lxcore.sys and lxss.sys files form the new “Windows Subsystem for Linux (WSL)). Why do we want to restrict Tika to select few environment when it is written in Java to be platform agnostic. I would give it another try and update if I can get a cleaner patch.
          Hide
          chrismattmann Chris A. Mattmann added a comment -

          haven't heard back since April. Closing.

          Show
          chrismattmann Chris A. Mattmann added a comment - haven't heard back since April. Closing.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tika/pull/96

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/96

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              mit2nil Nilay Chheda
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development