Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3809

OutOfMemoryError occurs while reading doc file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Duplicate
    • 1.23
    • None
    • app
    • None
    • Important

    Description

      OutOfMemoryError occurs while parsing a docx file of size 8 MB (uncompressed size 250 MB). while analyzing the heapdump(.hprof), the thread that parses the file consumes about 750 MB heap size. while looking into a dominator_tree,

      org.apache.xmlbeans.impl.store.Xobj$ElementXobj
      

       This object has been created many times!

      I've also attached the stacktrace,

      at org.apache.xmlbeans.impl.store.Cur.createElementXobj(Lorg/apache/xmlbeans/impl/store/Locale;Ljavax/xml/namespace/QName;Ljavax/xml/namespace/QName;)Lorg/apache/xmlbeans/impl/store/Xobj; (Cur.java:260)
        at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.startElement(Ljavax/xml/namespace/QName;)V (Cur.java:2997)
        at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Lorg/xml/sax/Attributes;)V (Locale.java:3164)
        at org.apache.xerces.parsers.AbstractSAXParser.startElement(Lorg/apache/xerces/xni/QName;Lorg/apache/xerces/xni/XMLAttributes;Lorg/apache/xerces/xni/Augmentations;)V (Unknown Source)
        at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Lorg/apache/xerces/xni/QName;Lorg/apache/xerces/xni/XMLAttributes;Lorg/apache/xerces/xni/Augmentations;)V (Unknown Source)
        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement()Z (Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Z)Z (Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Z)Z (Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Z)Z (Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Lorg/apache/xerces/xni/parser/XMLInputSource;)V (Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Lorg/apache/xerces/xni/parser/XMLInputSource;)V (Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Lorg/xml/sax/InputSource;)V (Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Lorg/xml/sax/InputSource;)V (Unknown Source)
        at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Lorg/apache/xmlbeans/impl/store/Locale;Lorg/xml/sax/InputSource;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/impl/store/Cur; (Locale.java:3422)
        at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject; (Locale.java:1272)
        at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Lorg/apache/xmlbeans/SchemaTypeLoader;Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject; (Locale.java:1259)
        at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(Ljava/io/InputStream;Lorg/apache/xmlbeans/SchemaType;Lorg/apache/xmlbeans/XmlOptions;)Lorg/apache/xmlbeans/XmlObject; (SchemaTypeLoaderBase.java:345)
        at org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Ljava/io/InputStream;Lorg/apache/xmlbeans/XmlOptions;)Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/DocumentDocument; (Unknown Source)
        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead()V (XWPFDocument.java:178)
        at org.apache.poi.ooxml.POIXMLDocument.load(Lorg/apache/poi/ooxml/POIXMLFactory;)V (POIXMLDocument.java:184)
        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(Lorg/apache/poi/openxml4j/opc/OPCPackage;)V (XWPFDocument.java:138)
        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(Lorg/apache/poi/openxml4j/opc/OPCPackage;)V (XWPFWordExtractor.java:60)
        at org.apache.poi.ooxml.extractor.ExtractorFactory.createExtractor(Lorg/apache/poi/openxml4j/opc/OPCPackage;)Lorg/apache/poi/extractor/POITextExtractor; (ExtractorFactory.java:224)
        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V (OOXMLExtractorFactory.java:170)
        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V (OOXMLParser.java:113)
        at org.apache.tika.parser.CompositeParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V (CompositeParser.java:280)
        at org.apache.tika.parser.CompositeParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V (CompositeParser.java:280)
        at org.apache.tika.parser.AutoDetectParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V (AutoDetectParser.java:143) 

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            earl3005 earl
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: