Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
ManifoldCF 2.6
-
None
-
None
Description
The confluence crawler skips comments. For a site which uses this as a recorded collaboration platform the comments often are where the text is which needs to be searched.
I've found that by adding `children.comment.body.view` to the `expand` querystring field you can get one level of comments. Subsequent levels can be added to the response by adding children.comment.children.comment.body.view for the second level. 3rd, 4th, 5th levels of comments can be added with the 5th being children.comment.children.comment.children.comment.children.comment.children.comment.body.view
I realize that this doesn't get 100% of the comments but 5 levels of nesting seems like a reasonable chunk to capture.
An alternative would be to crawl comments separately and set the page-type to 'comment' rather than 'page'. While this also has value I think fetching the comments along with the page requests offers the biggest bang for the buck.