Details
Description
the host normalization in Generator$Selector#reduce at line 177 seems broken:
String host = new URL(url.toString()).getHost();
...
try
catch (Exception e)
{ LOG.warn("Malformed URL: '" + host + "', skipping"); }With default configuration the basic nomalizer will be called, which is doing 'new URL(host)'.
Also in line below 'new URL(host)' will be called.
Since url.getHost() always return the host without protocol, there will be a MalformedUrlException be thrown, always.
The job will continue as usual though, cause the exception is catched.