Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3097

Out of memory while parsing docx

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Implemented
    • 1.24
    • None
    • core, parser
    • None

    Description

      I have written simple Scala code to extract the content from uploaded file which is docx. JVM goes OOM when tika tries to parse the file. I have configured JVM heap to 1GB and tried with 2GB same issue occurs, issue both with jar as well as in my code.
      Attached the file for reference.

      Attachments

        1. samplefile.txt
          21.90 MB
          suchendra
        2. Screenshot from 2020-05-07 08-14-25.png
          24 kB
          Tim Allison
        3. test.docx
          4.14 MB
          suchendra

        Activity

          People

            Unassigned Unassigned
            suchendra suchendra
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: