I haven't tested it yet but after a quick review, latest patch looks good to me. However, it would be nice if we can have some unit tests for the new functionality.
> Extending the authentication to work for more than one host was in my mind but I found too many possible cases. So I was
> planning to have a different configuration file where all the authentication rules can be mentioned to override the corresponding
> 'conf/nutch-site.xml' properties. The different possible cases are: [...]
OK, a different configuration file sounds good (I don't like that we are putting a file in conf/ for a plugin, but we already do that anyway. We should probably prefix the name of the file with plugin's name to make it clear, like: httpclient-auth.txt)
> I removed cookie related code earlier because I didn't find it to work (even before merging my work). However, I have brought
> them back in the revised patch. We can discuss more on this if required.
I think it should work. It doesn't remember cookies across different crawl cycles but it should remember them during a single fetch.
> I have restored most of the original response reading code except for 'calculateTryToRead'. This method is not checking for
> 'Content-Length' limit. The content-length limit check present in this patch is similar to that of 'protocol-http' which is simpler
> and correct.