Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2908

TikaException: Failed to close temporary resource - how to fix?

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.21
    • Fix Version/s: 1.22
    • Component/s: ocr, parser
    • Labels:
    • Flags:
      Important

      Description

      I am using Apache Tika on Windows 10, jre 1.8.0_181, and I've imported Tika using Maven with the following dependencies:

      <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers</artifactId> <version>1.21</version> </dependency> </dependencies>

      I have the code below for performing OCR using Tesseract (which I have independently tested and know to be working):

      public static void OCRTest() {

      try { 

      BufferedImage im = ImageIO.read(new File(OCR_IMAGE)); 

      TesseractOCRConfig config = new TesseractOCRConfig();

      config.setTessdataPath("C:
      Program Files\\Tesseract-OCR\tessdata");

      config.setTesseractPath("C:
      Program Files
      Tesseract-OCR"); 

      ParseContext parseContext = new ParseContext();

      parseContext.set(TesseractOCRConfig.class, config);

      TesseractOCRParser parser = new TesseractOCRParser();

      BodyContentHandler handler = new BodyContentHandler();

      Metadata metadata = new Metadata();

      try {

      parser.parse(im, handler, metadata, parseContext);

      System.out.println(handler.toString());

      } catch (SAXException e){ e.printStackTrace(); }

      catch (TikaException e) { e.printStackTrace(); }

      } catch (IOException e){ e.printStackTrace(); }

      }

      I run into the following exception:

      org.apache.tika.exception.TikaException: Failed to close temporary resources at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:174) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:251) at test.test.App.OCRTest(App.java:46) at test.test.App.main(App.java:30) Caused by: java.nio.file.FileSystemException: C:\Users\m\AppData\Local\Temp\apache-tika-2643805894084124300.tmp: The process cannot access the file because it is being used by another process. 

      The tmp file is present in the Temp folder. I have the source code downloaded and have stepped through it with the debugger - the error comes from attempting to close the tmp file. There is another post on this board (https://issues.apache.org/jira/browse/TIKA-1732) where someone else has run into the same exception, although with the AutoDetectParser and not Tesseract. Their issue seemed to be a conflict in their imported jars, but I run into this issue even with only the Apache Tika libraries installed. I have a feeling this is a concurrency issue, but I can't pinpoint the conflict.

      I don't run into this issue when using the Tika's AutoDetectParser, only with the TesseractOCRParser. This is an important part of an application I'm working on, so I would really appreciate any insights on how to proceed.

        Attachments

          Activity

            People

            • Assignee:
              tallison@apache.org Tim Allison
              Reporter:
              marichi Marichi Gupta
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: