Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.10
-
None
Description
When invoking the commoncrawldump tool with the -gzip option and -mimtype application/pdf I get the following stack trace which results in a failure of the task
java.lang.RuntimeException: file name 'Socio-Economic%20Impact%20of%20Ebola%20on%20Households%20in%20Liberia%20Nov%2019%20(final,%20revised).pdf' is too long ( > 100 bytes) at org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.handleLongName(TarArchiveOutputStream.java:674) at org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.putArchiveEntry(TarArchiveOutputStream.java:275) at org.apache.nutch.tools.CommonCrawlDataDumper.dump(CommonCrawlDataDumper.java:400) at org.apache.nutch.tools.CommonCrawlDataDumper.main(CommonCrawlDataDumper.java:236)
The workaround consists of not using the -gzip option, instead delaying this until a later task, however this is a workaround and not a solution.
We need to fix this in order for the tool to work as designed and required.