Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-1518

MCF shutting down when Tika is used

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • ManifoldCF 2.10
    • ManifoldCF 2.11
    • Tika extractor
    • None

    Description

        ```Jul 26, 2018 1:21:51 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
      WARNING: org.xerial's sqlite-jdbc is not loaded.
      Please provide the jar on your classpath to parse sqlite files.
      See tika-parsers/pom.xml for the correct version.
      agents process ran out of memory - shutting down
      java.lang.OutOfMemoryError: Java heap space
      {{ {{ at java.base/java.util.Arrays.copyOf(Arrays.java:3816)}}}}
      {{ {{ at java.base/java.util.BitSet.ensureCapacity(BitSet.java:338)}}}}
      {{ {{ at java.base/java.util.BitSet.expandTo(BitSet.java:353)}}}}
      {{ {{ at java.base/java.util.BitSet.set(BitSet.java:448)}}}}
      {{ {{ at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)}}}}
      {{ {{ at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)}}}}
      {{ {{ at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}}}
      {{ {{ at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)}}}}
      {{ {{ at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}}}
      {{ {{ at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}}}
      {{ {{ at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}}}
      {{ {{ at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)}}}}
      {{ {{ at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)}}}}
      {{ {{ at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)}}}}
      {{ {{ at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)}}}}
      {{ {{ at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)}}}}
      {{ {{ at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:609)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:392)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:343)}}}}
      {{ {{ at org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92)}}}}
      {{ {{ at org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:109)}}}}
      {{ {{ at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:179)}}}}
      {{ {{ at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:136)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:319)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:170)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:184)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)}}}}
      {{ {{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
      {{ {{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
      {{ {{ at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}}}
      {{ [Thread-475] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@37095ded{HTTP/1.1}{{

      {0.0.0.0:8345}

      }}}}
      {{ {{[Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@5a6d5a8f

      {/mcf-api-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-14189461872304124764.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-14189461872304124764.dir/webapp/,UNAVAILABLE]}

      }}{{

      {/opt/manifoldcf/manifoldcf_single/././web/war/mcf-api-service.war}}}}}
      {{ [Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@6979efad{/mcf-authority-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-11619445383548662284.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-11619445383548662284.dir/webapp/,UNAVAILABLE]}{/opt/manifoldcf/manifoldcf_single/././web/war/mcf-authority-service.war}}}
      2018-07-26 13:22:47,170 qtp2061226112-492 FATAL Unable to register shutdown hook because JVM is shutting down. java.lang.IllegalStateException: Cannot add new shutdown hook as this is not started. Current state: STOPPED
      {{ {{ at org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)}}}}
      {{ {{ at org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)}}}}
      {{ {{ at org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)}}}}
      {{ {{ at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)}}}}
      {{ {{ at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)}}}}
      {{ {{ at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)}}}}
      {{ {{ at org.apache.logging.log4j.LogManager.getContext(LogManager.java:270)}}}}
      {{ {{ at org.apache.log4j.Logger$PrivateManager.getContext(Logger.java:59)}}}}
      {{ {{ at org.apache.log4j.Logger.getLogger(Logger.java:37)}}}}
      {{ {{ at org.apache.velocity.runtime.log.Log4JLogChute.init(Log4JLogChute.java:72)}}}}
      {{ {{ at org.apache.velocity.runtime.log.LogManager.createLogChute(LogManager.java:157)}}}}
      {{ {{ at org.apache.velocity.runtime.log.LogManager.updateLog(LogManager.java:269)}}}}
      {{ {{ at org.apache.velocity.runtime.RuntimeInstance.initializeLog(RuntimeInstance.java:871)}}}}
      {{ {{ at org.apache.velocity.runtime.RuntimeInstance.init(RuntimeInstance.java:262)}}}}
      {{ {{ at org.apache.velocity.runtime.RuntimeInstance.requireInitialization(RuntimeInstance.java:302)}}}}
      {{ {{ at org.apache.velocity.runtime.RuntimeInstance.getTemplate(RuntimeInstance.java:1531)}}}}
      {{ {{ at org.apache.velocity.app.VelocityEngine.mergeTemplate(VelocityEngine.java:343)}}}}
      {{ {{ at org.apache.manifoldcf.ui.i18n.Messages.outputResourceWithVelocity(Messages.java:159)}}}}
      {{ {{ at org.apache.manifoldcf.agents.transformation.tika.Messages.outputResourceWithVelocity(Messages.java:136)}}}}
      {{ {{ at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.outputSpecificationBody(TikaExtractor.java:544)}}}}
      {{ {{ at org.apache.jsp.editjob_jsp._jspService(editjob_jsp.java:3002)}}}}
      {{ {{ at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)}}}}
      {{ {{ at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)}}}}
      {{ {{ at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388)}}}}
      {{ {{ at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)}}}}
      {{ {{ at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)}}}}
      {{ {{ at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)}}}}
      {{ {{ at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)}}}}
      {{ {{ at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)}}}}
      {{ {{ at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)}}}}
      {{ {{ at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)}}}}
      {{ {{ at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)}}}}
      {{ {{ at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)}}}}
      {{ {{ at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)}}}}
      {{ {{ at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)}}}}
      {{ {{ at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)}}}}
      {{ {{ at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)}}}}
      {{ {{ at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)}}}}
      {{ {{ at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)}}}}
      {{ {{ at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)}}}}
      {{ {{ at org.eclipse.jetty.server.Server.handle(Server.java:497)}}}}
      {{ {{ at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)}}}}
      {{ {{ at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)}}}}
      {{ {{ at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)}}}}
      {{ {{ at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)}}}}
      {{ {{ at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)}}}}
      {{ {{ at java.base/java.lang.Thread.run(Thread.java:844)}}}}[Worker thread '35'] WARN org.apache.tika.parser.microsoft.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing summary entry SummaryInformation
      java.lang.RuntimeException: java.nio.channels.ClosedByInterruptException
      {{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:151)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream.getBlockIterator(NPOIFSStream.java:95)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.NPOIFSDocument.getBlockIterator(NPOIFSDocument.java:179)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.NDocumentInputStream.<init>(NDocumentInputStream.java:82)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.DocumentInputStream.<init>(DocumentInputStream.java:65)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:83)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)}}}}
      {{ {{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
      {{ {{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
      {{ {{ at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}}}
      {{ {{ at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)}}}}
      {{ {{ at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)}}}}
      {{ {{ at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)}}}}
      {{ {{ at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)}}}}
      {{ {{ at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)}}}}
      {{ {{ at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)}}}}
      {{ {{ at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)}}}}
      {{ {{ at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)}}}}
      {{ {{ at org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:448)}}}}
      {{ {{ at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)}}}}
      Caused by: java.nio.channels.ClosedByInterruptException
      {{ {{ at java.base/java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:199)}}}}
      {{ {{ at java.base/sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:388)}}}}
      {{ {{ at org.apache.poi.poifs.nio.FileBackedDataSource.size(FileBackedDataSource.java:137)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getChainLoopDetector(NPOIFSFileSystem.java:627)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:149)}}}}
      {{ {{ ... 21 more}}}}
      [Worker thread '35'] WARN org.apache.tika.parser.microsoft.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing summary entry DocumentSummaryInformation
      java.lang.RuntimeException: java.nio.channels.ClosedChannelException
      {{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:151)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream.getBlockIterator(NPOIFSStream.java:95)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.NPOIFSMiniStore.getBlockAt(NPOIFSMiniStore.java:67)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:169)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:142)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.NDocumentInputStream.readFully(NDocumentInputStream.java:264)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.NDocumentInputStream.read(NDocumentInputStream.java:162)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.DocumentInputStream.read(DocumentInputStream.java:127)}}}}
      {{ {{ at org.apache.poi.util.BoundedInputStream.read(BoundedInputStream.java:121)}}}}
      {{ {{ at org.apache.poi.util.BoundedInputStream.read(BoundedInputStream.java:103)}}}}
      {{ {{ at org.apache.poi.util.IOUtils.copy(IOUtils.java:312)}}}}
      {{ {{ at org.apache.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:70)}}}}
      {{ {{ at org.apache.poi.hpsf.PropertySet.isPropertySetStream(PropertySet.java:393)}}}}
      {{ {{ at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:191)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:83)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:74)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)}}}}
      {{ {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)}}}}
      {{ {{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
      {{ {{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}}}
      {{ {{ at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}}}
      {{ {{ at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)}}}}
      {{ {{ at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)}}}}
      {{ {{ at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)}}}}
      {{ {{ at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)}}}}
      {{ {{ at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)}}}}
      {{ {{ at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)}}}}
      {{ {{ at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)}}}}
      {{ {{ at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)}}}}
      {{ {{ at org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:448)}}}}
      {{ {{ at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)}}}}
      Caused by: java.nio.channels.ClosedChannelException
      {{ {{ at java.base/sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:158)}}}}
      {{ {{ at java.base/sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:373)}}}}
      {{ {{ at org.apache.poi.poifs.nio.FileBackedDataSource.size(FileBackedDataSource.java:137)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getChainLoopDetector(NPOIFSFileSystem.java:627)}}}}
      {{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:149)}}}}
      {{ {{ ... 30 more}}}} ```}}{{Following up:When these exceptions occur, the heap runs out:13:39:39.856 [Worker thread '49'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:39.970 [Worker thread '43'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:40.415 [Worker thread '34'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:40.469 [Worker thread '1'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:43.739 [Worker thread '32'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:44.697 [Worker thread '43'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:45.756 [Worker thread '33'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:45.775 [Worker thread '36'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:46.751 [Worker thread '35'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:46.753 [Worker thread '40'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:47.536 [Worker thread '45'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:48.734 [Worker thread '44'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:50.922 [Worker thread '30'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:54.930 [Worker thread '28'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:40:33.660 [Worker thread '29'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      agents process ran out of memory - shutting down
      java.lang.OutOfMemoryError: Java heap space
      {{ at java.base/java.lang.StringLatin1.newString(StringLatin1.java:549)}}
      {{ at java.base/java.lang.StringBuilder.toString(StringBuilder.java:415)}}
      {{ at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:341)}}
      {{ at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)}}
      {{ at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)}}
      {{ at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}
      {{ at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)}}
      {{ at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}
      {{ at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}
      {{ at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)}}
      {{ at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)}}
      {{ at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)}}
      {{ at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)}}
      {{ at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)}}
      {{ at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)}}
      {{ at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)}}
      {{ at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)}}
      {{ at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:609)}}
      {{ at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:392)}}
      {{ at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:343)}}
      {{ at org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92)}}
      {{ at org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:109)}}
      {{ at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:179)}}
      {{ at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:136)}}
      {{ at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:319)}}
      {{ at org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:170)}}
      {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:184)}}
      {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)}}
      {{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
      {{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
      {{ at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}
      {{ at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)}}
      agents process ran out of memory - shutting down
      java.lang.OutOfMemoryError: Java heap space
      {{ at java.base/java.util.Arrays.copyOf(Arrays.java:3744)}}
      {{ at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:146)}}
      {{ at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:531)}}
      {{ at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:550)}}
      {{ at java.base/java.lang.StringBuilder.append(StringBuilder.java:171)}}
      {{ at java.base/java.util.regex.Matcher.appendReplacement(Matcher.java:1002)}}
      {{ at java.base/java.util.regex.Matcher.replaceAll(Matcher.java:1181)}}
      {{ at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)}}
      {{ at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)}}
      {{ at de.l3s.boilerpipe.sax.CommonTagActions$3.end(CommonTagActions.java:143)}}
      {{ at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.endElement(BoilerpipeHTMLContentHandler.java:183)}}
      {{ at org.apache.tika.parser.html.BoilerpipeContentHandler.endElement(BoilerpipeContentHandler.java:175)}}
      {{ at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)}}
      {{ at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:256)}}
      {{ at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)}}
      {{ at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)}}
      {{ at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)}}
      {{ at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:273)}}
      {{ at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:224)}}
      {{ at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:109)}}
      {{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
      {{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
      {{ at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}
      {{ at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)}}
      {{ at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)}}
      {{ at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)}}
      {{ at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)}}
      {{ at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)}}
      {{ at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)}}
      {{ at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)}}
      {{ at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)}}
      {{ at org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:448)}}
      13:40:33.995 [Worker thread '42'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      [Thread-475] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@5d235104{HTTP/1.1}{0.0.0.0:8345}
      {{[Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@6105f8a3{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-9896962439762567079.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf/manifoldcf_single/././web/war/mcf-api-service.war}

      }}
      {{[Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@12365c88{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-3954308360064638561.dir/webapp/,UNAVAILABLE}
      {/opt/manifoldcf/manifoldcf_single/././web/war/mcf-authority-service.war}

      }}

       

      Follow-up: When these issues occur, the jvm runs out of space:

      13:39:39.856 [Worker thread '49'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:39.970 [Worker thread '43'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:40.415 [Worker thread '34'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:40.469 [Worker thread '1'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:43.739 [Worker thread '32'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:44.697 [Worker thread '43'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:45.756 [Worker thread '33'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:45.775 [Worker thread '36'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:46.751 [Worker thread '35'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:46.753 [Worker thread '40'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:47.536 [Worker thread '45'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:48.734 [Worker thread '44'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:50.922 [Worker thread '30'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:39:54.930 [Worker thread '28'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      13:40:33.660 [Worker thread '29'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      agents process ran out of memory - shutting down
      java.lang.OutOfMemoryError: Java heap space
      at java.base/java.lang.StringLatin1.newString(StringLatin1.java:549)
      at java.base/java.lang.StringBuilder.toString(StringBuilder.java:415)
      at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:341)
      at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:198)
      at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
      at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
      at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
      at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
      at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
      at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
      at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
      at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
      at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
      at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
      at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
      at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
      at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)
      at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:609)
      at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:392)
      at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:343)
      at org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92)
      at org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:109)
      at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:179)
      at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:136)
      at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:319)
      at org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:170)
      at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:184)
      at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
      at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
      agents process ran out of memory - shutting down
      java.lang.OutOfMemoryError: Java heap space
      at java.base/java.util.Arrays.copyOf(Arrays.java:3744)
      at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:146)
      at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:531)
      at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:550)
      at java.base/java.lang.StringBuilder.append(StringBuilder.java:171)
      at java.base/java.util.regex.Matcher.appendReplacement(Matcher.java:1002)
      at java.base/java.util.regex.Matcher.replaceAll(Matcher.java:1181)
      at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(UnicodeTokenizer.java:40)
      at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(BoilerpipeHTMLContentHandler.java:296)
      at de.l3s.boilerpipe.sax.CommonTagActions$3.end(CommonTagActions.java:143)
      at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.endElement(BoilerpipeHTMLContentHandler.java:183)
      at org.apache.tika.parser.html.BoilerpipeContentHandler.endElement(BoilerpipeContentHandler.java:175)
      at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
      at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:256)
      at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
      at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
      at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
      at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:273)
      at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:224)
      at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:109)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
      at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
      at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
      at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
      at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
      at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
      at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
      at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
      at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
      at org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:448)
      13:40:33.995 [Worker thread '42'] WARN org.apache.manifoldcf.jobs - Service interruption reported for job 1532551209410 connection 'file': IO exception: null
      [Thread-475] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@5d235104{HTTP/1.1}

      {0.0.0.0:8345}

      [Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@6105f8a3

      {/mcf-api-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-9896962439762567079.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-9896962439762567079.dir/webapp/,UNAVAILABLE]}

      {/opt/manifoldcf/manifoldcf_single/././web/war/mcf-api-service.war}

      [Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@12365c88

      {/mcf-authority-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-3954308360064638561.dir/webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-3954308360064638561.dir/webapp/,UNAVAILABLE]} {/opt/manifoldcf/manifoldcf_single/././web/war/mcf-authority-service.war}

       This occurs when ES Connector has this issue:

      07-26-2018 19:34:25.356 Indexation (ES) file:/var/manifoldcf/corpus/000640.html CLIENTPROTOCOLEXCEPTION 46190 9

      Attachments

        1. CONNECTORS-1518.patch
          2 kB
          Karl Wright

        Activity

          People

            kwright@metacarta.com Karl Wright
            svanschalkwyk Steph van Schalkwyk
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: