Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.14
-
None
-
Tested on both Nutch 1.13 and 1.14 in Ubuntu Linux with OpenJDK 1.8.
Description
While trying to use the protocol-smb plugin (which is not part of the Nutch distribution) I realized there are four steps to successfully make use of a protocol plugin:
1 - put the artifact into the plugins directory
2 - modify Nutch configuration files to allow smb:// urls plus include the plugin to the loaded list
3 - extract jcifs.jar and place it on the system classpath
4 - run nutch with the correct system property
While steps 1 and 2 seem obvious, 3 and 4 require knowledge of plugin internals which does not feel right for nutch and plugin users. Even more, the jcifs.jar would exist twice on the classpath and could even cause further problems during runtime.
Attachments
Issue Links
- causes
-
NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode if protocol-okhttp is used
- Closed
-
NUTCH-2949 Tasks of a multi-threaded map runner may fail because of slow creation of URL stream handlers
- Closed
- is related to
-
NUTCH-714 Need a SFTP and SCP Protocol Handler
- Closed
-
NUTCH-427 protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
- Closed
- links to