Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-3012

SegmentReader when dumping with option -recode: NPE on unparsed documents

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.19
    • 1.20
    • segment
    • None
    • Patch Available

    Description

      SegmentReader when called with the flag -recode fails with a NPE when trying to stringify the raw content of unparsed documents:

      $> bin/nutch readseg  -dump crawl/segments/20231009065431 crawl/segreader/20231009065431 -recode
      ...
      2023-10-09 07:55:18,451 INFO mapreduce.Job: Task Id : attempt_1696825862783_0005_r_000000_0, Status : FAILED
      Error: java.lang.NullPointerException: charset
              at java.base/java.lang.String.<init>(String.java:504)
              at java.base/java.lang.String.<init>(String.java:561)
              at org.apache.nutch.protocol.Content.toString(Content.java:297)
              at org.apache.nutch.segment.SegmentReader$InputCompatReducer.reduce(SegmentReader.java:189)
      

      Attachments

        Issue Links

          Activity

            People

              snagel Sebastian Nagel
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: