I noticed that there is no activity record logged for documents excluded by the Document Filter transformation connector in the WebCrawler connector.
To reproduce the issue on MCF out of the box :
Null output connector
Web repository connector
- DocumentFilter added which only accepts application/msword (doc/docx) documents
The simple history does not mention the documents excluded (excepted for html documents). They have fetch activity and that's all (see simple_history_web.jpeg).
We can only see the documents excluded by the MCF log (with DEBUG verbosity activity on connectors) :
The related code is in WebcrawlerConnector.java l.904 :
The activityResultCode is null.
If we configure the same job but for a Local File system connector with the same Document Filter transformation connector, the simple history mentions all the documents excluded in the simple history (see simple_history_files.jpeg) and the code mentions a specific error code with an activity record logged (class FileConnector l. 415) :
So the Web Crawler connector should have the same behaviour than for FileConnector and explicitly mention all the documents excluded by the user I think.