Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3154

Exception while extracting msg files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.24.1
    • None
    • parser
    • None

    Description

      While parsing msg file containing some html text inside, we are getting exception from Tika.

      Command : java -jar tika-app-1.24.1.jar html_code.msg

      Exception coming : 

      /Aug 07, 2020 10:59:00 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
      WARNING: org.xerial's sqlite-jdbc is not loaded.
      Please provide the jar on your classpath to parse sqlite files.
      See tika-parsers/pom.xml for the correct version.
      Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1
      	at org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:293 undefined)
      	at org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 undefined)
      	at org.apache.tikar.AutoDetectParser.parse.parse(AutoDetectParser.java:143 undefined)
      	at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209 undefined)
      	at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496 undefined)
      	at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149 undefined)
      Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 1326748, but 1000000 is the maximum for this record type.
      If the file is not corrupt, please open an issue on bugzilla to request 
      increasing the maximum allowable size for this record type.
      As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()
      	at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630 undefined)
      	at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208 undefined)
      	at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610 undefined)
      	at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596 undefined)
      	at org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49 undefined)
      	at org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328 undefined)
      	at org.apache.tikar.microsoft.OutlookExtractor.parse.parse(OutlookExtractor.java:247 undefined)
      	at org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:199 undefined)
      	at org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:131 undefined)
      	at org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 undefined)/ 
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              akki1607 Akash
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: