Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2394

Possible bugs in the source code

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.13
    • 1.14
    • None

    Description

      Hi!
      I've checked your project with static analyzer AppChecker and if found several suspicious code fragments:
      1) src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java

      heading.trim();
      

      heading is not changed, because java.lang.String.trim returns new string.
      Probably, it should be:

      heading = heading.trim();
      

      see also:

      2) src/java/org/apache/nutch/crawl/URLPartitioner.java#L84

      if (mode.equals(PARTITION_MODE_DOMAIN) && url != null)
        ...
      else if ..
        ...
        InetAddress address = InetAddress.getByName(url.getHost());
        ...
      

      if url is null, method url.getHost() will be invoked, so NullPointerException wiil be thrown

      3) src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java#L346

      String[] fullPathLevels = fullDir.split(File.separator);
      

      Using File.separator in regular expressions may throws java.util.regex.PatternSyntaxException exceptions, because it is "\" on Windows-based systems.
      Possible correction:

      String[] fullPathLevels = fullDir.split(Pattern.quote(File.separator));
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              AppChecker AppChecker
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: