Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-383

Upgrade Nutch to Hadoop 0.7

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.9.0
    • 0.9.0
    • None
    • None

    Description

      Upgrade Nutch to Hadoop 0.7, and replace all occurences of UTF8 with Text. UTF8 is deprecated and its use is discouraged due to its limitations.

      This change will break API, in the sense that all third-party additions will have to be updated to use new APIs that use Text instead of UTF8 in method parameters.

      This change also breaks backward compatibility of data in CrawlDb, LinkDb and segments. A tool to upgrade CrawlDb, LinkDb and segments can be created to facilitate the upgrade path.

      Attachments

        1. patch.txt
          172 kB
          Andrzej Bialecki
        2. patch-v2.txt
          194 kB
          Andrzej Bialecki
        3. patch-v3.txt
          155 kB
          Andrzej Bialecki

        Activity

          People

            ab Andrzej Bialecki
            ab Andrzej Bialecki
            Votes:
            2 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: