Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1949 Dump out the Nutch data into the Common Crawl format
  3. NUTCH-2251

Make CommonCrawlFormatJackson instance reusable by properly handling object state

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • commoncrawl
    • None

    Description

      The class `CommonCrawlFormatJackson` keeps appending the documents when it is used for more formatting more than one document.
      This class shall be modified to handle states such that the same instance can be used instead of creating new one for each document being dumped.

      This suggestion has been mentioned in the previous fix related to format issue : https://github.com/apache/nutch/pull/103

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              thammegowda Thamme Gowda
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: