Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2963

Tika在抽取.xlsx类型的大文件时出现OOM错误

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.20
    • None
    • core
    • None

    Description

      对于docx和pptx类型的文件,Tika可配置SAX解析器来提高抽取性能。但是Tika在抽取.xlsx类型的大文件时仍会出现OOM错误,我暂时没有从官方找到解决方案,下面附上自己的代码,也是基于SAX解析器的解决方案,代码可根据实际情况进行参数调优,多有不足之处,大家批评指正,谢谢

      Attachments

        1. demo.java
          8 kB
          Feng Jiao Jiang

        Activity

          People

            Unassigned Unassigned
            Mr_Jiang Feng Jiao Jiang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: