Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-521

OutOfMemoryError Parsing XSLX File

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.7, 0.8
    • 0.10
    • parser
    • None

    Description

      I have several XSLX files I'm trying to parse with Tika that are failing with an OutOfMemoryError even when using a large heap size. For instance the attached 1.26MB excel file fails using a 512MB heap.

      Attachments

        1. memory-test.xlsx
          1.27 MB
          Stephen Charles Duncan, Jr.
        2. Out of memory issue in 1.0.jpg
          229 kB
          samraj
        3. Out of memory issue in 1.0.jpg
          229 kB
          samraj
        4. tika-diff.txt
          2 kB
          Sjoerd Smeets
        5. TikaExcelEventBasedExtraction.diff
          21 kB
          Nick Burch
        6. tika-new-files.tar.bz2
          5 kB
          Sjoerd Smeets

        Activity

          People

            nick Nick Burch
            jrduncans Stephen Charles Duncan, Jr.
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: