Description
CommonCrawlDataDumper tool is able to generate CBOR-encoded files, extracted from Nutch crawled data, using the Common Crawl format. By default, CommonCrawlDataDumper uses the original file extension.
We are going to add support for a command-line option (e.g., -extension) that allows the user to provide a file extension to use in place of the original one.
Attachments
Attachments
Issue Links
- is related to
-
TIKA-1610 CBOR Parser and detection [improvement]
- Resolved