Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Implemented
-
1.17
Description
The dumps of CrawlDbReader (text, CSV, JSON) are not compressed given the configured file output compression. E.g., if running
$> bin/nutch readdb \ -Dmapreduce.output.fileoutputformat.compress=true \ -Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.BZip2Codec \ crawldb/ -dump crawldb.dump -format json
the output should be compressed using bzip2.
See the Hadoop class FileOutputFormat and the implementation in TextOutputFormat.
Attachments
Issue Links
- links to