Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2182

Make reverseUrlDirs file dumper option hash the URL for consistency

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.11
    • Fix Version/s: 1.12
    • Component/s: tool
    • Labels:
      None

      Description

      At the moment the "reverseUrlDirs" option for FileDumper is terribly brittle and fails on a fair number of edge cases. A more robust way to handle the reverse URL approach to dumping a file is to reverse the server part and hash the URL to use as the file name. This gives us a nice split of files while avoiding a number of likely classes that causes dumps to fail.

        Attachments

        1. NUTCH-2182_joyce_8Dec2015.patch
          1 kB
          Michael Joyce

          Issue Links

            Activity

              People

              • Assignee:
                mjoyce Michael Joyce
                Reporter:
                mjoyce Michael Joyce
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: