Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: nutchgora, 1.5
    • Fix Version/s: 2.4
    • Component/s: protocol
    • Labels:
      None

      Description

      There are several issues about protocol-httpclient and several comments about rewriting the plugin with the new http client libraries. There is, however, not yet an issue for rewriting/reimplementing protocol-httpclient.

      http://hc.apache.org/httpcomponents-client-ga/

        Issue Links

          Activity

          Hide
          Talat UYARER added a comment -

          Hi Markus,

          Yes I know that Httpclient is still in development as part of Apache HttpComponents. Second comment is very good information for me. Actually i asked that question because i found a little bug in protocol-http: Even If I have http.content.limit value set, protocol-http fetches files of all sizes (larger files are fetched until limit allows).
          But when Parsing, parser skips incomplete files (parser.skip.truncated configuration). It seems like an unnecessary effort to partially fetch contents larger than limit if they are not gonna be parsed.
          What do you think about this? I will upload a patch about this issue.

          Show
          Talat UYARER added a comment - Hi Markus, Yes I know that Httpclient is still in development as part of Apache HttpComponents. Second comment is very good information for me. Actually i asked that question because i found a little bug in protocol-http: Even If I have http.content.limit value set, protocol-http fetches files of all sizes (larger files are fetched until limit allows). But when Parsing, parser skips incomplete files (parser.skip.truncated configuration). It seems like an unnecessary effort to partially fetch contents larger than limit if they are not gonna be parsed. What do you think about this? I will upload a patch about this issue.
          Hide
          Markus Jelsma added a comment -

          And to answer your question, no, i'm not working on this issue. We still manage with protocol-http and only use protocol-httpclient for TLS connections. It still works, for now

          Show
          Markus Jelsma added a comment - And to answer your question, no, i'm not working on this issue. We still manage with protocol-http and only use protocol-httpclient for TLS connections. It still works, for now
          Hide
          Markus Jelsma added a comment -

          Hi Talat - what do you mean by EOL of HttpClient? Version 4.3 was just releases a few months ago. I assume you mean that Nutch' implementation of it is old, it is indeed! This issue is about completely rewriting Nutch' protocol-httpclient plugin to the most recent version of the HttpClient 4.x.

          Show
          Markus Jelsma added a comment - Hi Talat - what do you mean by EOL of HttpClient? Version 4.3 was just releases a few months ago. I assume you mean that Nutch' implementation of it is old, it is indeed! This issue is about completely rewriting Nutch' protocol-httpclient plugin to the most recent version of the HttpClient 4.x.
          Hide
          Talat UYARER added a comment -

          Markus,

          I guess httpclient is end of life. Are you make any development for this issue ?

          Show
          Talat UYARER added a comment - Markus, I guess httpclient is end of life. Are you make any development for this issue ?
          Hide
          Ross Judson added a comment -

          The Oracle bug report # is 7129065. HttpUrlConnection-based NTLM auth to Sharepoint succeeds with JDK 6, and crashes the VM on JDK. I am investigating other solutions to this.

          Show
          Ross Judson added a comment - The Oracle bug report # is 7129065. HttpUrlConnection-based NTLM auth to Sharepoint succeeds with JDK 6, and crashes the VM on JDK. I am investigating other solutions to this.
          Hide
          Oleg Kalnichevski added a comment -

          For what it is worth to you, HttpClient users have been reporting the best NTLMv2 compatibility results when using JCIFS as an NTLM engine. The trouble is the library is LGPL licensed and therefore may not be directly incorporated into ASF works. However, you might consider giving your users an option of hooking JCIFS up though an extension mechanism of some sort similar to that used by HttpClient [1]

          Oleg

          [1] http://hc.apache.org/httpcomponents-client-ga/ntlm.html

          Show
          Oleg Kalnichevski added a comment - For what it is worth to you, HttpClient users have been reporting the best NTLMv2 compatibility results when using JCIFS as an NTLM engine. The trouble is the library is LGPL licensed and therefore may not be directly incorporated into ASF works. However, you might consider giving your users an option of hooking JCIFS up though an extension mechanism of some sort similar to that used by HttpClient [1] Oleg [1] http://hc.apache.org/httpcomponents-client-ga/ntlm.html
          Hide
          Ferdy Galema added a comment -

          Seems like a JVM bug, perhaps you could reproduce it using specific urls? Btw, does anyone has an NTLMv2 example URL that is publicly accessible?

          Besides lacking NTLMv2 support, is there anything else that isn't working properly? Support for https is not entirely broken, because "https://www.iana.org/" for example can be fetched perfectly fine.

          Show
          Ferdy Galema added a comment - Seems like a JVM bug, perhaps you could reproduce it using specific urls? Btw, does anyone has an NTLMv2 example URL that is publicly accessible? Besides lacking NTLMv2 support, is there anything else that isn't working properly? Support for https is not entirely broken, because "https://www.iana.org/" for example can be fetched perfectly fine.
          Hide
          Remi Tassing added a comment -

          With the dirty code I wrote on NTLMv2 and HttpUrlConnection, I'm having the following Java error from time to time. I believe it's due to the poor integration of my code with Nutch:

          #

          1. A fatal error has been detected by the Java Runtime Environment:
            #
          2. EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x762135c8, pid=7320, tid=5720
            #
          3. JRE version: 7.0-b147
          4. Java VM: Java HotSpot(TM) Client VM (21.0-b17 mixed mode windows-x86 )
          5. Problematic frame:
          6. C [Secur32.dll+0x35c8]
            #
          7. Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
            #
          8. If you would like to submit a bug report, please visit:
          9. http://bugreport.sun.com/bugreport/crash.jsp
          10. The crash happened outside the Java Virtual Machine in native code.
          11. See problematic frame for where to report the bug.
            #

          --------------- T H R E A D ---------------

          Current thread (0x4753a800): JavaThread "FetcherThread" daemon [_thread_in_native, id=5720, stack(0x48350000,0x483a0000)]

          siginfo: ExceptionCode=0xc0000005, reading address 0x00000010

          Registers:
          EAX=0x00000000, EBX=0x00000000, ECX=0x4839f0cc, EDX=0x02bdafe8
          ESP=0x4839f0c4, EBP=0x4839f0d4, ESI=0x002b0058, EDI=0x00000000
          EIP=0x762135c8, EFLAGS=0x00010202

          Top of Stack: (sp=0x4839f0c4)
          0x4839f0c4: 4839f0cc 65a5014e 002b0058 02bdafe8
          0x4839f0d4: 4839f0e4 6b62a15c 477c2d10 4753a928
          0x4839f0e4: 4839f180 6b62a2b1 477c2d10 477c2d00
          0x4839f0f4: 4753a800 437469b8 437469b8 052c98e8
          0x4839f104: 4839f320 025aa595 4839f354 4839f1c4
          0x4839f114: 4753a800 47657b90 47075798 470757d8
          0x4839f124: 00000000 00000001 4839f130 00000200
          0x4839f134: 00000002 4839f154 477c2d10 00000000

          Instructions: (pc=0x762135c8)
          0x762135a8: 00 e8 c2 f6 ff ff 8b f0 85 f6 74 1c 56 ff 35 54
          0x762135b8: 10 22 76 ff 15 20 11 21 76 8b 46 60 8d 4d f8 51
          0x762135c8: ff 50 10 5e c9 c2 04 00 b8 01 03 09 80 eb f4 90
          0x762135d8: 90 90 90 90 8b ff 55 8b ec 51 51 8b 45 08 8b 08

          Register to memory mapping:

          EAX=0x00000000 is an unknown value
          EBX=0x00000000 is an unknown value
          ECX=0x4839f0cc is pointing into the stack for thread: 0x4753a800
          EDX=0x02bdafe8 is an unknown value
          ESP=0x4839f0c4 is pointing into the stack for thread: 0x4753a800
          EBP=0x4839f0d4 is pointing into the stack for thread: 0x4753a800
          ESI=0x002b0058 is an unknown value
          EDI=0x00000000 is an unknown value

          Stack: [0x48350000,0x483a0000], sp=0x4839f0c4, free space=316k
          Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
          C [Secur32.dll+0x35c8] FreeCredentialsHandle+0x30
          C [net.dll+0xa15c] Java_sun_net_www_protocol_http_ntlm_NTLMAuthSequence_getCredentialsHandle+0x180
          C [net.dll+0xa2b1] Java_sun_net_www_protocol_http_ntlm_NTLMAuthSequence_getNextToken+0x137

          Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
          j sun.net.www.protocol.http.ntlm.NTLMAuthSequence.getNextToken(J[B)[B+0
          j sun.net.www.protocol.http.ntlm.NTLMAuthSequence.getAuthHeader(Ljava/lang/String;)Ljava/lang/String;+24
          j sun.net.www.protocol.http.ntlm.NTLMAuthentication.setHeaders(Lsun/net/www/protocol/http/HttpURLConnection;Lsun/net/www/HeaderParser;Ljava/lang/String;)Z+73
          j sun.net.www.protocol.http.HttpURLConnection.getServerAuthentication(Lsun/net/www/protocol/http/AuthenticationHeader;)Lsun/net/www/protocol/http/AuthenticationInfo;+760
          j sun.net.www.protocol.http.HttpURLConnection.getInputStream()Ljava/io/InputStream;+972
          j sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream()Ljava/io/InputStream;+4
          j org.apache.nutch.protocol.httpclient.HttpResponse.<init>(Lorg/apache/nutch/protocol/httpclient/Http;Ljava/net/URL;Lorg/apache/nutch/crawl/CrawlDatum;Z)V+453
          j org.apache.nutch.protocol.httpclient.Http.getResponse(Ljava/net/URL;Lorg/apache/nutch/crawl/CrawlDatum;Z)Lorg/apache/nutch/net/protocols/Response;+13
          j org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(Lorg/apache/hadoop/io/Text;Lorg/apache/nutch/crawl/CrawlDatum;)Lorg/apache/nutch/protocol/ProtocolOutput;+283
          j org.apache.nutch.fetcher.Fetcher$FetcherThread.run()V+646
          v ~StubRoutines::call_stub

          --------------- P R O C E S S ---------------

          Java Threads: ( => current thread )
          0x4708ac00 JavaThread "Thread-27" daemon [_thread_in_native, id=6032, stack(0x48cf0000,0x48d40000)]
          0x4708a400 JavaThread "MultiThreadedHttpConnectionManager cleanup" daemon [_thread_blocked, id=4920, stack(0x48450000,0x484a0000)]
          0x4708a000 JavaThread "FetcherThread" daemon [_thread_blocked, id=6244, stack(0x01210000,0x01260000)]
          0x47089800 JavaThread "FetcherThread" daemon [_thread_blocked, id=7148, stack(0x483a0000,0x483f0000)]
          =>0x4753a800 JavaThread "FetcherThread" daemon [_thread_in_native, id=5720, stack(0x48350000,0x483a0000)]
          0x4753a000 JavaThread "FetcherThread" daemon [_thread_blocked, id=7808, stack(0x48200000,0x48250000)]
          0x47539c00 JavaThread "FetcherThread" daemon [_thread_blocked, id=6348, stack(0x47300000,0x47350000)]
          0x47539000 JavaThread "FetcherThread" daemon [_thread_blocked, id=4668, stack(0x47410000,0x47460000)]
          0x470b7800 JavaThread "FetcherThread" daemon [_thread_blocked, id=4424, stack(0x480d0000,0x48120000)]
          0x4764c000 JavaThread "FetcherThread" daemon [_thread_blocked, id=1600, stack(0x48140000,0x48190000)]
          0x4764b800 JavaThread "FetcherThread" daemon [_thread_blocked, id=4476, stack(0x47b20000,0x47b70000)]
          0x4764b400 JavaThread "FetcherThread" daemon [_thread_blocked, id=8000, stack(0x47350000,0x473a0000)]
          0x4767ac00 JavaThread "SpillThread" daemon [_thread_blocked, id=5708, stack(0x47bd0000,0x47c20000)]
          0x47689400 JavaThread "communication thread" daemon [_thread_blocked, id=4976, stack(0x47260000,0x472b0000)]
          0x4711e800 JavaThread "Thread-11" [_thread_blocked, id=6608, stack(0x478d0000,0x47920000)]
          0x47089000 JavaThread "Service Thread" daemon [_thread_blocked, id=3652, stack(0x00b30000,0x00b80000)]
          0x4706c400 JavaThread "C1 CompilerThread0" daemon [_thread_blocked, id=5272, stack(0x473c0000,0x47410000)]
          0x4706b000 JavaThread "Attach Listener" daemon [_thread_blocked, id=3568, stack(0x00b90000,0x00be0000)]
          0x47069c00 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=6512, stack(0x472b0000,0x47300000)]
          0x0240f000 JavaThread "Finalizer" daemon [_thread_blocked, id=4252, stack(0x01260000,0x012b0000)]
          0x0240c800 JavaThread "Reference Handler" daemon [_thread_blocked, id=7492, stack(0x00aa0000,0x00af0000)]
          0x00b1dc00 JavaThread "main" [_thread_blocked, id=2896, stack(0x00820000,0x00870000)]

          Other Threads:
          0x02407800 VMThread [stack: 0x46fc0000,0x47010000] [id=8048]
          0x4709b000 WatcherThread [stack: 0x47460000,0x474b0000] [id=6908]

          VM state:not at safepoint (normal execution)

          VM Mutex/Monitor currently owned by a thread: None

          Heap
          def new generation total 81664K, used 13803K [0x045a0000, 0x09e30000, 0x192f0000)
          eden space 72640K, 19% used [0x045a0000, 0x0531ada0, 0x08c90000)
          from space 9024K, 0% used [0x08c90000, 0x08c90000, 0x09560000)
          to space 9024K, 0% used [0x09560000, 0x09560000, 0x09e30000)
          tenured generation total 181236K, used 108739K [0x192f0000, 0x243ed000, 0x42da0000)
          the space 181236K, 59% used [0x192f0000, 0x1fd20ff8, 0x1fd21000, 0x243ed000)
          compacting perm gen total 12288K, used 10197K [0x42da0000, 0x439a0000, 0x46da0000)
          the space 12288K, 82% used [0x42da0000, 0x43795548, 0x43795600, 0x439a0000)
          No shared spaces configured.

          Code Cache [0x025a0000, 0x027c8000, 0x045a0000)
          total_blobs=1160 nmethods=977 adapters=115 free_code_cache=30587Kb largest_free_block=31319360

          Dynamic libraries:
          ...

          VM Arguments:
          ...
          Launcher Type: SUN_STANDARD

          Environment Variables:
          ...

          --------------- S Y S T E M ---------------
          ...
          elapsed time: 336 seconds

          Show
          Remi Tassing added a comment - With the dirty code I wrote on NTLMv2 and HttpUrlConnection, I'm having the following Java error from time to time. I believe it's due to the poor integration of my code with Nutch: # A fatal error has been detected by the Java Runtime Environment: # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x762135c8, pid=7320, tid=5720 # JRE version: 7.0-b147 Java VM: Java HotSpot(TM) Client VM (21.0-b17 mixed mode windows-x86 ) Problematic frame: C [Secur32.dll+0x35c8] # Failed to write core dump. Minidumps are not enabled by default on client versions of Windows # If you would like to submit a bug report, please visit: http://bugreport.sun.com/bugreport/crash.jsp The crash happened outside the Java Virtual Machine in native code. See problematic frame for where to report the bug. # --------------- T H R E A D --------------- Current thread (0x4753a800): JavaThread "FetcherThread" daemon [_thread_in_native, id=5720, stack(0x48350000,0x483a0000)] siginfo: ExceptionCode=0xc0000005, reading address 0x00000010 Registers: EAX=0x00000000, EBX=0x00000000, ECX=0x4839f0cc, EDX=0x02bdafe8 ESP=0x4839f0c4, EBP=0x4839f0d4, ESI=0x002b0058, EDI=0x00000000 EIP=0x762135c8, EFLAGS=0x00010202 Top of Stack: (sp=0x4839f0c4) 0x4839f0c4: 4839f0cc 65a5014e 002b0058 02bdafe8 0x4839f0d4: 4839f0e4 6b62a15c 477c2d10 4753a928 0x4839f0e4: 4839f180 6b62a2b1 477c2d10 477c2d00 0x4839f0f4: 4753a800 437469b8 437469b8 052c98e8 0x4839f104: 4839f320 025aa595 4839f354 4839f1c4 0x4839f114: 4753a800 47657b90 47075798 470757d8 0x4839f124: 00000000 00000001 4839f130 00000200 0x4839f134: 00000002 4839f154 477c2d10 00000000 Instructions: (pc=0x762135c8) 0x762135a8: 00 e8 c2 f6 ff ff 8b f0 85 f6 74 1c 56 ff 35 54 0x762135b8: 10 22 76 ff 15 20 11 21 76 8b 46 60 8d 4d f8 51 0x762135c8: ff 50 10 5e c9 c2 04 00 b8 01 03 09 80 eb f4 90 0x762135d8: 90 90 90 90 8b ff 55 8b ec 51 51 8b 45 08 8b 08 Register to memory mapping: EAX=0x00000000 is an unknown value EBX=0x00000000 is an unknown value ECX=0x4839f0cc is pointing into the stack for thread: 0x4753a800 EDX=0x02bdafe8 is an unknown value ESP=0x4839f0c4 is pointing into the stack for thread: 0x4753a800 EBP=0x4839f0d4 is pointing into the stack for thread: 0x4753a800 ESI=0x002b0058 is an unknown value EDI=0x00000000 is an unknown value Stack: [0x48350000,0x483a0000] , sp=0x4839f0c4, free space=316k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [Secur32.dll+0x35c8] FreeCredentialsHandle+0x30 C [net.dll+0xa15c] Java_sun_net_www_protocol_http_ntlm_NTLMAuthSequence_getCredentialsHandle+0x180 C [net.dll+0xa2b1] Java_sun_net_www_protocol_http_ntlm_NTLMAuthSequence_getNextToken+0x137 Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j sun.net.www.protocol.http.ntlm.NTLMAuthSequence.getNextToken(J[B)[B+0 j sun.net.www.protocol.http.ntlm.NTLMAuthSequence.getAuthHeader(Ljava/lang/String;)Ljava/lang/String;+24 j sun.net.www.protocol.http.ntlm.NTLMAuthentication.setHeaders(Lsun/net/www/protocol/http/HttpURLConnection;Lsun/net/www/HeaderParser;Ljava/lang/String;)Z+73 j sun.net.www.protocol.http.HttpURLConnection.getServerAuthentication(Lsun/net/www/protocol/http/AuthenticationHeader;)Lsun/net/www/protocol/http/AuthenticationInfo;+760 j sun.net.www.protocol.http.HttpURLConnection.getInputStream()Ljava/io/InputStream;+972 j sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream()Ljava/io/InputStream;+4 j org.apache.nutch.protocol.httpclient.HttpResponse.<init>(Lorg/apache/nutch/protocol/httpclient/Http;Ljava/net/URL;Lorg/apache/nutch/crawl/CrawlDatum;Z)V+453 j org.apache.nutch.protocol.httpclient.Http.getResponse(Ljava/net/URL;Lorg/apache/nutch/crawl/CrawlDatum;Z)Lorg/apache/nutch/net/protocols/Response;+13 j org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(Lorg/apache/hadoop/io/Text;Lorg/apache/nutch/crawl/CrawlDatum;)Lorg/apache/nutch/protocol/ProtocolOutput;+283 j org.apache.nutch.fetcher.Fetcher$FetcherThread.run()V+646 v ~StubRoutines::call_stub --------------- P R O C E S S --------------- Java Threads: ( => current thread ) 0x4708ac00 JavaThread "Thread-27" daemon [_thread_in_native, id=6032, stack(0x48cf0000,0x48d40000)] 0x4708a400 JavaThread "MultiThreadedHttpConnectionManager cleanup" daemon [_thread_blocked, id=4920, stack(0x48450000,0x484a0000)] 0x4708a000 JavaThread "FetcherThread" daemon [_thread_blocked, id=6244, stack(0x01210000,0x01260000)] 0x47089800 JavaThread "FetcherThread" daemon [_thread_blocked, id=7148, stack(0x483a0000,0x483f0000)] =>0x4753a800 JavaThread "FetcherThread" daemon [_thread_in_native, id=5720, stack(0x48350000,0x483a0000)] 0x4753a000 JavaThread "FetcherThread" daemon [_thread_blocked, id=7808, stack(0x48200000,0x48250000)] 0x47539c00 JavaThread "FetcherThread" daemon [_thread_blocked, id=6348, stack(0x47300000,0x47350000)] 0x47539000 JavaThread "FetcherThread" daemon [_thread_blocked, id=4668, stack(0x47410000,0x47460000)] 0x470b7800 JavaThread "FetcherThread" daemon [_thread_blocked, id=4424, stack(0x480d0000,0x48120000)] 0x4764c000 JavaThread "FetcherThread" daemon [_thread_blocked, id=1600, stack(0x48140000,0x48190000)] 0x4764b800 JavaThread "FetcherThread" daemon [_thread_blocked, id=4476, stack(0x47b20000,0x47b70000)] 0x4764b400 JavaThread "FetcherThread" daemon [_thread_blocked, id=8000, stack(0x47350000,0x473a0000)] 0x4767ac00 JavaThread "SpillThread" daemon [_thread_blocked, id=5708, stack(0x47bd0000,0x47c20000)] 0x47689400 JavaThread "communication thread" daemon [_thread_blocked, id=4976, stack(0x47260000,0x472b0000)] 0x4711e800 JavaThread "Thread-11" [_thread_blocked, id=6608, stack(0x478d0000,0x47920000)] 0x47089000 JavaThread "Service Thread" daemon [_thread_blocked, id=3652, stack(0x00b30000,0x00b80000)] 0x4706c400 JavaThread "C1 CompilerThread0" daemon [_thread_blocked, id=5272, stack(0x473c0000,0x47410000)] 0x4706b000 JavaThread "Attach Listener" daemon [_thread_blocked, id=3568, stack(0x00b90000,0x00be0000)] 0x47069c00 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=6512, stack(0x472b0000,0x47300000)] 0x0240f000 JavaThread "Finalizer" daemon [_thread_blocked, id=4252, stack(0x01260000,0x012b0000)] 0x0240c800 JavaThread "Reference Handler" daemon [_thread_blocked, id=7492, stack(0x00aa0000,0x00af0000)] 0x00b1dc00 JavaThread "main" [_thread_blocked, id=2896, stack(0x00820000,0x00870000)] Other Threads: 0x02407800 VMThread [stack: 0x46fc0000,0x47010000] [id=8048] 0x4709b000 WatcherThread [stack: 0x47460000,0x474b0000] [id=6908] VM state:not at safepoint (normal execution) VM Mutex/Monitor currently owned by a thread: None Heap def new generation total 81664K, used 13803K [0x045a0000, 0x09e30000, 0x192f0000) eden space 72640K, 19% used [0x045a0000, 0x0531ada0, 0x08c90000) from space 9024K, 0% used [0x08c90000, 0x08c90000, 0x09560000) to space 9024K, 0% used [0x09560000, 0x09560000, 0x09e30000) tenured generation total 181236K, used 108739K [0x192f0000, 0x243ed000, 0x42da0000) the space 181236K, 59% used [0x192f0000, 0x1fd20ff8, 0x1fd21000, 0x243ed000) compacting perm gen total 12288K, used 10197K [0x42da0000, 0x439a0000, 0x46da0000) the space 12288K, 82% used [0x42da0000, 0x43795548, 0x43795600, 0x439a0000) No shared spaces configured. Code Cache [0x025a0000, 0x027c8000, 0x045a0000) total_blobs=1160 nmethods=977 adapters=115 free_code_cache=30587Kb largest_free_block=31319360 Dynamic libraries: ... VM Arguments: ... Launcher Type: SUN_STANDARD Environment Variables: ... --------------- S Y S T E M --------------- ... elapsed time: 336 seconds
          Hide
          Remi Tassing added a comment -

          For the NTLMv2 issue I used a dirty solution in HttpResponse.java. Inside the creator and after the getResponseBodyAsStream()attempt:
          1. I check the result code, if it's 500 (inside finally

          {...}

          )
          2. I use HttpUrlConnection to authenticate and open a connection
          3. Then read the InputStream, get the Content and change the code to 200

          The problems with that solution are that:
          1. The authentication keys are hardcoded
          2. It doesn't check if the content is valid or not but set the return code to 200
          3. Error code 500 doesn't necessarily mean that it's a NTLMv2 authentication problem

          I have no idea on how to write patches to the "trunk"...

          Remi

          Show
          Remi Tassing added a comment - For the NTLMv2 issue I used a dirty solution in HttpResponse.java. Inside the creator and after the getResponseBodyAsStream()attempt: 1. I check the result code, if it's 500 (inside finally {...} ) 2. I use HttpUrlConnection to authenticate and open a connection 3. Then read the InputStream, get the Content and change the code to 200 The problems with that solution are that: 1. The authentication keys are hardcoded 2. It doesn't check if the content is valid or not but set the return code to 200 3. Error code 500 doesn't necessarily mean that it's a NTLMv2 authentication problem I have no idea on how to write patches to the "trunk"... Remi
          Hide
          Lewis John McGibbney added a comment -

          When trying to access some SharePoint(IIS) website using NTLMv2 authentication, Nutch fails and gets an error code 500. HttpClient only supports an early version of NTLM but not NTLMv2. HttpUrlConnection can be used instead.

          [1]http://oaklandsoftware.com/papers/ntlm.html
          [2]http://developer-resource.blogspot.com/2008/06/ntlm-authentication-from-java.html

          Show
          Lewis John McGibbney added a comment - When trying to access some SharePoint(IIS) website using NTLMv2 authentication, Nutch fails and gets an error code 500. HttpClient only supports an early version of NTLM but not NTLMv2. HttpUrlConnection can be used instead. [1] http://oaklandsoftware.com/papers/ntlm.html [2] http://developer-resource.blogspot.com/2008/06/ntlm-authentication-from-java.html
          Hide
          Aravind Srini added a comment -

          Thanks, Oleg for pitching in and confirming the right thing.

          Meanwhile - SOLR-2727 logged independently, to upgrade that to httpclient 4.x codeline.

          Show
          Aravind Srini added a comment - Thanks, Oleg for pitching in and confirming the right thing. Meanwhile - SOLR-2727 logged independently, to upgrade that to httpclient 4.x codeline.
          Hide
          Oleg Kalnichevski added a comment -

          The 4.1.3 release of HttpCore patched a regression affecting non-blocking (NIO) SSL transports only. There have been no changes between 4.1.2 and 4.1.3 releases in blocking transport components relevant for HttpClient.

          Please let me know if you need any help migrating off HttpClient 3.1 to HttpClient 4.1.x.

          Oleg

          Show
          Oleg Kalnichevski added a comment - The 4.1.3 release of HttpCore patched a regression affecting non-blocking (NIO) SSL transports only. There have been no changes between 4.1.2 and 4.1.3 releases in blocking transport components relevant for HttpClient. Please let me know if you need any help migrating off HttpClient 3.1 to HttpClient 4.1.x. Oleg
          Hide
          Aravind Srini added a comment -

          Some transitive dependencies:

          • Solr 3.1.0 , seems to depend on commons-httpclient 3.1.

          Started an independent email thread with the solr community ( "solr - httpclient from 3.x to 4.1.x" ) to open it up for discussion.

          • hadoop 0.20.2 , depends on commons-httpclient 3.0.1 as well.

          Also - httpclient 4.1.2, depends on httpcore 4.1.2 - but there seems to have been an emergency release of httpcore 4.1.3 ( and httpclient , not republished after the same) so both needs to be explicitly published in ivy.xml (or pom.xml ).

          Show
          Aravind Srini added a comment - Some transitive dependencies: Solr 3.1.0 , seems to depend on commons-httpclient 3.1. Started an independent email thread with the solr community ( "solr - httpclient from 3.x to 4.1.x" ) to open it up for discussion. hadoop 0.20.2 , depends on commons-httpclient 3.0.1 as well. Also - httpclient 4.1.2, depends on httpcore 4.1.2 - but there seems to have been an emergency release of httpcore 4.1.3 ( and httpclient , not republished after the same) so both needs to be explicitly published in ivy.xml (or pom.xml ).
          Hide
          Ken Krugler added a comment -

          For what it's worth, there's a SimpleHttpFetcher in crawler-commons that uses HttpClient 4.1.

          Show
          Ken Krugler added a comment - For what it's worth, there's a SimpleHttpFetcher in crawler-commons that uses HttpClient 4.1.
          Hide
          Markus Jelsma added a comment -

          Preferably the 4.1.x version. Nutch still uses the deprecated 3.x and there are a lot of issues to be resolved such as HTTPS support.

          Show
          Markus Jelsma added a comment - Preferably the 4.1.x version. Nutch still uses the deprecated 3.x and there are a lot of issues to be resolved such as HTTPS support.
          Hide
          Aravind Srini added a comment -

          Are we talking about httpclient 4.0.1 ?

          Show
          Aravind Srini added a comment - Are we talking about httpclient 4.0.1 ?

            People

            • Assignee:
              Unassigned
              Reporter:
              Markus Jelsma
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development