Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
nutchgora
-
None
-
None
-
Patch Available
Description
1. crawl command (nutch1.patch)
The class was renamed to Crawler but the references to it were not updated.
2. URL filter (nutch2.patch)
This avoids a NPE on bogus urls which host do not have a suffix.
3. Content-Length limit (nutch3.patch)
This is related to NUTCH-899.
The patch avoids the entire flush operation on the Gora datastore to crash because the MySQL blob limit was exceeded by a few bytes. Both protocol-http and protocol-httpclient plugins were problematic.
4. Ivy configuration (nutch4.patch)
- Change xercesImpl and restlet versions. These 2 version changes are required. The first one currently makes a JUnit test crash, the second one is missing in default Maven repository.
- Add gora-hbase, zookeeper which is an HBase dependency. Add MySQL connector. These jars are necesary to run Gora with HBase or MySQL datastores. (more a suggestion that a requirement here)
- Add com.jcraft/jsch, which is a protocol-sftp plugin dependency.
Attachments
Attachments
1.
|
Fix crawl command | Closed | Julien Nioche | |
2.
|
Bugfix for Content-Length limit in http protocols | Closed | Julien Nioche |