Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2349

urlnormalizer-basic NPE for ill-formed URL "http:/"

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4, 1.13
    • Fix Version/s: 2.4, 1.13
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      NUTCH-2337 introduced a potential (though rare) NullPointerException when an ill-formed URL (just the protocol followed by ":", ":/", ":////" or even more slashes):

      % echo "http://///" \
        | runtime/local/bin/nutch org.apache.nutch.net.URLNormalizerChecker \
           -normalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer 
      Checking URLNormalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
      Exception in thread "main" java.lang.NullPointerException
              at org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:120)
              at org.apache.nutch.net.URLNormalizerChecker.checkOne(URLNormalizerChecker.java:72)
              at org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:110)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                snagel Sebastian Nagel
                Reporter:
                snagel Sebastian Nagel
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: