Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.13
-
None
Description
Hi!
I've checked your project with static analyzer AppChecker and if found several suspicious code fragments:
1) src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java
heading.trim();
heading is not changed, because java.lang.String.trim returns new string.
Probably, it should be:
heading = heading.trim();
see also:
- src/plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java#L78
- src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java#L115
- src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java#L76
- src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java#L78
- src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java#L326
2) src/java/org/apache/nutch/crawl/URLPartitioner.java#L84
if (mode.equals(PARTITION_MODE_DOMAIN) && url != null) ... else if .. ... InetAddress address = InetAddress.getByName(url.getHost()); ...
if url is null, method url.getHost() will be invoked, so NullPointerException wiil be thrown
3) src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java#L346
String[] fullPathLevels = fullDir.split(File.separator);
Using File.separator in regular expressions may throws java.util.regex.PatternSyntaxException exceptions, because it is "\" on Windows-based systems.
Possible correction:
String[] fullPathLevels = fullDir.split(Pattern.quote(File.separator));
Attachments
Issue Links
- links to