Description
Whenever a page is parsed, one of the outputs is the directory 'parse_text'.
It is intended to be used at the indexing phase so the page can be searched from a search engine such as Solr.
In my special crawling case, I don't need to index the page contents. Therefore, creating and filing the 'parse_text' is not required for me. To optimize performance, I don't want the crawler to store this information to the filesystem.
I propose a new parameter "parser.store.text" allowing to choose whether to store 'parse_text' directory or not. Its default value, of course, is "true".
Attachments
Issue Links
- links to