On fetching of pages with meta-refresh tags the url is taken at face value without any filtering. Some urls, such as those used by struts return with a jsessionid or with query strings. Examples are:
http://www.somesite.com;jsessionid=3123123412ADBE3344
...
http://www.somesite.com?querystring=value
The RegexURLFilter will match these urls according to the following regex inside of the regex-urlfilter.txt file:
-[?*!@=]
Should these urls be cleaned up to allow processing and not match the previous URL filter or should they be ignored as they currently are?
NUTCH-255