Description
Nutch is unable to crawl some websites, regardless of protocol plugin you are using. The work-around you frequently find (-Djsse.enableSNIExtension=false) does not work at all, so the internet is clearly lying to us!
2017-10-23 12:43:52,911 INFO api.HttpRobotRulesParser - Couldn't get robots.txt for https://www.eidsiva.net/: javax.net.ssl.SSLProtocolException: handshake alert: unrecognized_name 2017-10-23 12:43:53,011 ERROR http.Http - Failed to get protocol output javax.net.ssl.SSLProtocolException: handshake alert: unrecognized_name at sun.security.ssl.ClientHandshaker.handshakeAlert(ClientHandshaker.java:1446) at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2016) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:152) at org.apache.nutch.protocol.http.Http.getResponse(Http.java:72) at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:271) at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:327)
Attachments
Attachments
Issue Links
- links to