Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1638

Make ExternalParser actually work

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.9
    • Component/s: parser
    • Labels:
      None

      Description

      Several issues in ExternalParser cause it to currently not function. They are enumerated below:

      • the class org.apache.tika.parser.external.CompositeExternalParser needs to be added to the META-INF/services/org.apache.tika.parser.Parser file
      • the ExternalParserConfigReader class incorrectly tokenizes the error check codes which use "," - the StringTokenizer used has a default delimiter set that doesn't include ","
      • the ExternalParserConfigReader does a check before adding Parsers in which it simply takes the given String command check and then wraps it in a String[]. This causes the check to fail if the command includes spaces in it (which most will, by its documentation, even). The command needs to be .split(" ") on whitespace in order for this to work and for ExternalParsers to actually be created and added.
      • the ExternalParser needs to split its command (similar to the ExternalParserConfigReader) if it includes whitespace (which most commands do) in order for the command to be successfully executed.
      • exception handling needs to be added to the exec command when running the external command.
      • any Threads started in e.g., extractMetadata, sendInput, etc., need to be started, and then joined, so that they actually finish and complete before moving on in the function. As it stands, metadata can be sometimes extracted, and sometimes not, b/c it's done by threads that aren't forced to actually complete before moving on, parsing, and returning.

      I have a patch which fixes all this. Forthcoming.

        Attachments

        1. TIKA-1638.Mattmann.052515.patch.txt
          5 kB
          Chris A. Mattmann

          Issue Links

            Activity

              People

              • Assignee:
                chrismattmann Chris A. Mattmann
                Reporter:
                chrismattmann Chris A. Mattmann
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: