Affects Version/s: 1.17
Fix Version/s: 1.17
Environment:Reproduced with: commit 9139d6ec7a98aea1af943755e9802066803b02b7 (HEAD -> master, origin/master, origin/HEAD) Merge: e61a8a3b f971ca1b Author: Sebastian Nagel <email@example.com> Date: Thu May 14 17:43:14 2020 +0200 Merge pull request #526 from sebastian-nagel/NUTCH-2419-urlfilter-rule-file-precedence NUTCH-2419 Some URL filters and normalizers do not respect command-line override for rule file
Patch Info:Patch Available
- Activate scoring-depth plugin
- Create a new crawldb from a seed URL:
- Dump the crawldb as json
- Look at the json
KO => `_depth` and `maxdepth_` are not integer.
The fields are correct in the crawldb, as shown by a CSV dump:
Code is here:
I do not know Java very well but I think it comes from IntWritable & co not being POJO types (or at least not the way we want them).
One fix might be to:
- Map all primitive type Writable classes to some function casting the base interface and calling "get" (may boxing the value as well).
- Call that in the metadata conversion loop.