Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.9
-
None
-
None
Description
CommonCrawlFormat is an interface for Java classes that implement methods for writing data into Common Crawl format. AbstractCommonCrawlFormat is an abstract class that implements CommonCrawlFormat and provides abstract methods for "CommonCrawl formatter" classes.
You can find in attachment a PATCH that includes some improvements for CommonCrawlFormat-based classes;
- CommonCrawlFormat and AbstractCommonCrawlFormat now provide only the getJsonData() method, responsible for getting out JSON data.
- AbstractCommonCrawlFormat provides also the abstract methods that each subclass has to implement in order to handle JSON objects.
- CommonCrawlFormatSimple is a StringBuilder-based formatter that now provide also escaping of JSON string values.
This PATCH aims at providing a better interface for implementing/extending CommonCrawlFormat classes.
I would really appreciate your feedback.
Thanks a lot,
Giuseppe
Attachments
Attachments
Issue Links
- is part of
-
NUTCH-1974 keyPrefix option for CommonCrawlDataDumper tool
- Closed