Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2725

Plugin lib-http to support per-host configurable cookies

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.15
    • 1.16
    • protocol
    • None
    • Patch Available

    Attachments

      1. NUTCH-2725.patch
        8 kB
        Markus Jelsma
      2. NUTCH-2725.patch
        7 kB
        Markus Jelsma

      Activity

        Hi markus17 looks good and works. A few minor points:

        • converting the URL object to a String, then parsing it again doesn't seem efficient (could just pass the URL object itself):
          cookie = http.getCookie(url.toString());
          ...
          public String getCookie(String url) {
             if (hostCookies != null) {
               return hostCookies.get(URLUtil.getHost(url));
             }
          ...
          
        • comment lines in the cookies.txt file cause an exception and the rest of the file is ignored (should generally report and skip invalid lines and continue):
          2019-07-25 16:58:24,052 WARN  http.Http - Failed to read http.agent.host.cookie.file cookies.txt: java.lang.ArrayIndexOutOfBoundsException: 1
                  at org.apache.nutch.protocol.http.api.HttpBase.setConf(HttpBase.java:278)
          
        • could add "http.agent.host.cookie.file" to nutch-default.xml
        snagel Sebastian Nagel added a comment - Hi markus17 looks good and works. A few minor points: converting the URL object to a String, then parsing it again doesn't seem efficient (could just pass the URL object itself): cookie = http.getCookie(url.toString()); ... public String getCookie( String url) { if (hostCookies != null ) { return hostCookies.get(URLUtil.getHost(url)); } ... comment lines in the cookies.txt file cause an exception and the rest of the file is ignored (should generally report and skip invalid lines and continue): 2019-07-25 16:58:24,052 WARN http.Http - Failed to read http.agent.host.cookie.file cookies.txt: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.nutch.protocol.http.api.HttpBase.setConf(HttpBase.java:278) could add "http.agent.host.cookie.file" to nutch-default.xml
        markus17 Markus Jelsma added a comment -

        Addressed all three points. Thanks Sebastian!

        markus17 Markus Jelsma added a comment - Addressed all three points. Thanks Sebastian!

        +1 looks good!

        snagel Sebastian Nagel added a comment - +1 looks good!
        markus17 Markus Jelsma added a comment -

        Committed a67c9bee..54f73bf7 master -> master

        Thanks Sebastian!

        markus17 Markus Jelsma added a comment - Committed a67c9bee..54f73bf7 master -> master Thanks Sebastian!
        hudson Hudson added a comment -

        FAILURE: Integrated in Jenkins build Nutch-trunk #3630 (See https://builds.apache.org/job/Nutch-trunk/3630/)
        NUTCH-2725 Plugin lib-http to support per-host configurable cookies (markus: https://github.com/apache/nutch/commit/54f73bf78ded8b66ba262270d069232417bbe391)

        • (edit) src/plugin/protocol-okhttp/src/java/org/apache/nutch/protocol/okhttp/OkHttpResponse.java
        • (edit) src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java
        • (edit) src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
        • (edit) conf/nutch-default.xml
        • (edit) src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
        • (add) conf/cookies.txt
        hudson Hudson added a comment - FAILURE: Integrated in Jenkins build Nutch-trunk #3630 (See https://builds.apache.org/job/Nutch-trunk/3630/ ) NUTCH-2725 Plugin lib-http to support per-host configurable cookies (markus: https://github.com/apache/nutch/commit/54f73bf78ded8b66ba262270d069232417bbe391 ) (edit) src/plugin/protocol-okhttp/src/java/org/apache/nutch/protocol/okhttp/OkHttpResponse.java (edit) src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java (edit) src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java (edit) conf/nutch-default.xml (edit) src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java (add) conf/cookies.txt

        People

          markus17 Markus Jelsma
          markus17 Markus Jelsma
          Votes:
          0 Vote for this issue
          Watchers:
          3 Start watching this issue

          Dates

            Created:
            Updated:
            Resolved: