Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1152

Process loops infinitely on parsing of a CHM file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.4
    • 1.5
    • parser
    • None
    • Windows/Linux

    Description

      By parsing the attachment CHM file (MS Microsoft Help Files), Java process stuck.

      Thread[main,5,main]
      
      	org.apache.tika.parser.chm.lzx.ChmLzxBlock.extractContent(ChmLzxBlock.java:203)
      	org.apache.tika.parser.chm.lzx.ChmLzxBlock.<init>(ChmLzxBlock.java:77)
      	org.apache.tika.parser.chm.core.ChmExtractor.extractChmEntry(ChmExtractor.java:338)
      	org.apache.tika.parser.chm.CHMDocumentInformation.getContent(CHMDocumentInformation.java:72)
      	org.apache.tika.parser.chm.CHMDocumentInformation.getText(CHMDocumentInformation.java:141)
      	org.apache.tika.parser.chm.CHM2XHTML.process(CHM2XHTML.java:34)
      	org.apache.tika.parser.chm.ChmParser.parse(ChmParser.java:51)
      	org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
      	org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      	org.apache.tika.parser.AbstractParser.parse(AbstractParser.java:53)
      	com.polyspot.document.converter.DocumentConverter.realizeConversion(DocumentConverter.java:192)
      ...
      

      Attachments

        1. ChmLzxBlock.java.patch
          2 kB
          Hong-Thai Nguyen
        2. eventcombmt.chm
          100 kB
          Hong-Thai Nguyen

        Activity

          People

            jukkaz Jukka Zitting
            thaichat04 Hong-Thai Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: