Description
the dumped metadata format is not readable.
$bin/nutch readdb crawldb/ -dump dir http://www.baidu.com/ Version: 7 Status: 3 (db_gone) Fetch time: Sat Aug 17 22:35:37 CST 2013 Modified time: Thu Jan 01 08:00:00 CST 1970 Retries since fetch: 0 Retry interval: 3888000 seconds (45 days) Score: 1.0 Signature: null Metadata: m1: v22m3: v3m2: v2m5: v5m4: m4_pst_: robots_denied(18), lastModified=0m6: v6
so I improve the Metadata format to this
Metadata: m1=v22;m3=v3;m2=v2;m5=v5;m4=m4;_pst_=robots_denied(18), lastModified=0;m6=v6;