Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1844

testresources/testcrawl not referenced anywhere in code

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.10
    • test
    • None

    Description

      While working on NUTCH-1526 in Review Board https://reviews.apache.org/r/9119/ lewismc tried to test out the ./bin/nutch dump tool on src/testresources/testcrawl and found that it failed due to an old o.a.h.io.UTF8 key type (instead of the o.a.h.io.Text) type.

      I looked into this - how were Nutch tests passing using this old code? I found that Andrzej a long time ago wrote a tool to update the index from the old UFT8 key format to Text - I also found that no where in the Nutch code is the testcrawl referenced.

      My suggestion:

      • we remove the testcrawl (it's not used)
      • if we don't remove it, we at least run Andrzej's tool on it and then upgrade it to use o.a.h.io.Text keys.

      I'll take care of this.

      Attachments

        Activity

          People

            chrismattmann Chris A. Mattmann
            chrismattmann Chris A. Mattmann
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: