Description
Boilerpipe is running for non-(X)html pages which is require more resources.
In my testing scenario, I've large PDFs in my websites and by enabling Boilerpipe I have to assign 8500MB for JAVA Heap to finish the crawl job without issues.
Disabling Boilerpipe allow me to minimize the JVM Heap to 500MB with no issues.
Attachments
Issue Links
- links to