Description
SegmentReader when called with the flag -recode fails with a NPE when trying to stringify the raw content of unparsed documents:
$> bin/nutch readseg -dump crawl/segments/20231009065431 crawl/segreader/20231009065431 -recode ... 2023-10-09 07:55:18,451 INFO mapreduce.Job: Task Id : attempt_1696825862783_0005_r_000000_0, Status : FAILED Error: java.lang.NullPointerException: charset at java.base/java.lang.String.<init>(String.java:504) at java.base/java.lang.String.<init>(String.java:561) at org.apache.nutch.protocol.Content.toString(Content.java:297) at org.apache.nutch.segment.SegmentReader$InputCompatReducer.reduce(SegmentReader.java:189)
Attachments
Issue Links
- links to