Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1844

testresources/testcrawl not referenced anywhere in code

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.10
    • Component/s: test
    • Labels:
      None

      Description

      While working on NUTCH-1526 in Review Board https://reviews.apache.org/r/9119/ Lewis John McGibbney tried to test out the ./bin/nutch dump tool on src/testresources/testcrawl and found that it failed due to an old o.a.h.io.UTF8 key type (instead of the o.a.h.io.Text) type.

      I looked into this - how were Nutch tests passing using this old code? I found that Andrzej a long time ago wrote a tool to update the index from the old UFT8 key format to Text - I also found that no where in the Nutch code is the testcrawl referenced.

      My suggestion:

      • we remove the testcrawl (it's not used)
      • if we don't remove it, we at least run Andrzej's tool on it and then upgrade it to use o.a.h.io.Text keys.

      I'll take care of this.

        Attachments

          Activity

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              chrismattmann Chris A. Mattmann
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: