|
[
Permlink
| « Hide
]
Andrzej Bialecki added a comment - 05/Jan/07 03:55 PM
JCIFS is licensed under LGPL, so it cannot be included in Nutch distribution. As a consequence, we could add this plugin but it wouldn't be a part of the regular build ...
The best way is to make the plugin available on plugin central, so that
people who needs the plugin can download it from there. New features are not critical. This plugin uses an LGPL library, which cannot be included in Nutch repository.
There is an Error in the plugin.xml File
the plugin id should be protocol-smb and not protocol-file! <?xml version="1.0" encoding="UTF-8" ?>
-->
This is an update to the previous Version. check the Included readme.txt
Title: protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares A. Introduction The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements B. Installation 1) Binaries only: The protocol-smb files can be found in the ../plugins directory. 2) Source code: The protocol-smb sources can be found in the ../src directory. C: Known Issues 1) URLMalformedException: unkown protocol: smb The SMB URL protocol handler is not being successfully installed. Workaround: a) a short term solutions will be to installed the JCIFS jar b) After completing step a), if the exeception is still thrown -Djava.protocol.handler.pkgs=jcifs c) You can set the property also in your Code for example if Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html 2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx This problem usually occurs if the following properties are not set correctly in
Also refer to the following resources for more information on the list of http://jcifs.samba.org/src/docs/api/overview-summary.html#scp N.B. All properties should set in the "smb.properties" file. You can set 3) Only tested on Windows XP and Windows Server 2003. Please report any tests The update fixes some issues which I had with the old version by trying to use it with Nutch 1.0-dev
Is there a reason why this plugin only handles directories? I had to make the following changes to enable file crawling:
in SMBResponse.java: also It got stuck in the file not found case. After examining the protocol-file code, I moved the else statement in SMB.java, lines 76 and 77 outside of the curly bracket on line 78. After this change, the code could continue after encountering a file not found rather than looping forever. And since then, it seems to work nicely on Windows Vista. Thanks for the plugin! Fixed reading of SMB files, updated to jcifs 1.3.0, enhanced the smoke
test app. Protected special characters such as apostrophe and hash mark with URL encoding. Fixed the infinite retry loop in SMB.java. Tried but could not activate the Apache logging. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||