Details
-
New Feature
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
0.8
-
None
-
None
Description
Subcollection is a subset of an index. Subcollections are defined
by urlpatterns in form of white/blacklist. So to get the page into
subcollection it must match the whitelist and not the blacklist.
Subcollection definitions are read from a file subcollections.xml
and the format is as follows (imagine here that you are crawling all
the virtualhosts from apache.org and you wan't to tag pages with
url pattern "http://lucene.apache.org/" to be part of subcollection
lucene.
<?xml version="1.0" encoding="UTF-8"?>
<subcollections>
<subcollection>
<name>lucene</name>
<id>lucene</id>
<whitelist>http://lucene.apache.org/</whitelist>
<blacklist />
</subcollection>
</subcollections>
plugin contains indexingfilter, query filter and supporting classes