Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2604

Error with certain jar paths on OS X

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.17
    • Fix Version/s: 1.18, 2.0.0
    • Component/s: cli
    • Labels:
      None
    • Environment:

      tika-app-1.17.jar, OS X 10.13.3. 

       

      Description

      I've been developing an R interface to the Tika batch processor for the past month ( see: https://github.com/predict-r/rtika ), and this software is awesome. I use the command line to call the batch processor, and my code has worked on Ubuntu, Windows 10 and OS X. Several people have been testing my code as well. Its been working.

      A few days ago I found an issue with the batch processor on OS X. 

      When calling the batch processor with the tika-app-1.17.jar on a path with spaces in it, Tika starts to continually restart.

      Here is an example of calling the jar when the path has spaces. It produces this error, and the unexpected restarts

      java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/space folder/tika-app.jar' -maxRestarts 1 -t -i '/' -o '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e' -fileList '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'
      
      INFO about to start driver
      INFO BatchProcess: Error: Could not find or load main class org.apache.tika.batch.fs.FSBatchProcessCLI
      INFO BatchProcess: Caused by: java.lang.ClassNotFoundException: org.apache.tika.batch.fs.FSBatchProcessCLI
      INFO The child process has finished with an exit value of: 1
      WARN Restarting on unexpected restart code: 1
      WARN Must restart process (exitValue=1 numRestarts=0 receivedRestartMessage=false)
      INFO BatchProcess: Error: Could not find or load main class org.apache.tika.batch.fs.FSBatchProcessCLI
      INFO BatchProcess: Caused by: java.lang.ClassNotFoundException: org.apache.tika.batch.fs.FSBatchProcessCLI
      INFO The child process has finished with an exit value of: 1
      WARN Restarting on unexpected restart code: 1
      WARN Hit the maximum number of process restarts. Driver is shutting down now.
      INFO Process driver has completed

      The error ALSO occurs with double quotes also around the jar.

      Now, in contrast, calling the jar when the path does not have spaces produces absolutely NO error:

      java -Djava.awt.headless=true -jar '/Users/sasha/Downloads/tika-app.jar' -maxRestarts 1 -t -i '/' -o '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_dircf81200b313e' -fileList '/var/folders/nr/74rgb64s3n98yccxwbv6vsxw0000gn/T/Rtmp9VEJvX/rtika_filecf81530d27ee'
      INFO about to start driver
      INFO BatchProcess: log4j:WARN No appenders could be found for logger (org.apache.tika.batch.fs.FSBatchProcessCLI).
      INFO BatchProcess: log4j:WARN Please initialize the log4j system properly.
      INFO BatchProcess: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
      INFO BatchProcess: Mar 09, 2018 12:19:17 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
      INFO BatchProcess: WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
      INFO BatchProcess: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
      INFO BatchProcess: for optional dependencies.
      INFO BatchProcess: TIFFImageWriter not loaded. tiff files will not be processed
      INFO BatchProcess: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
      INFO BatchProcess: for optional dependencies.
      INFO BatchProcess: J2KImageReader not loaded. JPEG2000 files will not be processed.
      INFO BatchProcess: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
      INFO BatchProcess: for optional dependencies.
      INFO BatchProcess:
      INFO BatchProcess: Mar 09, 2018 12:19:17 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
      INFO BatchProcess: WARNING: org.xerial's sqlite-jdbc is not loaded.
      INFO BatchProcess: Please provide the jar on your classpath to parse sqlite files.
      INFO BatchProcess: See tika-parsers/pom.xml for the correct version.
      INFO BatchProcess: randomCrawl attribute is ignored by FSListCrawler
      BatchProcess:Main thread in TikaFSBatchCLI has finished processing.
      BatchProcess:
      BatchProcess:
      BatchProcess:ParallelFileProcessingResult{considered=1, added=1, consumed=1, numberHandledExceptions=0, secondsElapsed=0.853, exitStatus=0, causeForTermination='COMPLETED_NORMALLY'}
      INFO The child process has finished with an exit value of: 0
      INFO Process driver has completed

      Further, and what makes this a batch processor issue, is that that path with the space in it produces absolutely NO error in the normal Tika CLI mode either:  

      java -jar '/Users/sasha/Downloads/space folder/tika-app.jar' -t /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rtika/extdata/jsonlite.pdf
      
      

      The last two examples work, but the first does not. 

      The only difference is the first is calling the batch processor, and that is causing restarts with whatever file.

       

        Attachments

          Activity

            People

            • Assignee:
              tallison Tim Allison
              Reporter:
              goodmansasha Sasha Goodman
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: