Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-521

OutOfMemoryError Parsing XSLX File

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.7, 0.8
    • 0.10
    • parser
    • None

    Description

      I have several XSLX files I'm trying to parse with Tika that are failing with an OutOfMemoryError even when using a large heap size. For instance the attached 1.26MB excel file fails using a 512MB heap.

      Attachments

        1. Out of memory issue in 1.0.jpg
          229 kB
          samraj
        2. Out of memory issue in 1.0.jpg
          229 kB
          samraj
        3. TikaExcelEventBasedExtraction.diff
          21 kB
          Nick Burch
        4. tika-diff.txt
          2 kB
          Sjoerd Smeets
        5. tika-new-files.tar.bz2
          5 kB
          Sjoerd Smeets
        6. memory-test.xlsx
          1.27 MB
          Stephen Charles Duncan, Jr.

        Activity

          People

            nick Nick Burch
            jrduncans Stephen Charles Duncan, Jr.
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: