Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2394

Possible bugs in the source code

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.13
    • Fix Version/s: 1.14
    • Component/s: None

      Description

      Hi!
      I've checked your project with static analyzer AppChecker and if found several suspicious code fragments:
      1) src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java

      heading.trim();
      

      heading is not changed, because java.lang.String.trim returns new string.
      Probably, it should be:

      heading = heading.trim();
      

      see also:

      2) src/java/org/apache/nutch/crawl/URLPartitioner.java#L84

      if (mode.equals(PARTITION_MODE_DOMAIN) && url != null)
        ...
      else if ..
        ...
        InetAddress address = InetAddress.getByName(url.getHost());
        ...
      

      if url is null, method url.getHost() will be invoked, so NullPointerException wiil be thrown

      3) src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java#L346

      String[] fullPathLevels = fullDir.split(File.separator);
      

      Using File.separator in regular expressions may throws java.util.regex.PatternSyntaxException exceptions, because it is "\" on Windows-based systems.
      Possible correction:

      String[] fullPathLevels = fullDir.split(Pattern.quote(File.separator));
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                AppChecker AppChecker
              • Votes:
                2 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: