Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2715

WARCExporter fails on large records

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.15
    • 1.16
    • None
    • None

    Description

      com.martinkl.warc.WARCRecord throws an IllegalStateException when a single line is over 10,000 bytes. Since this exception is not caught in WARCExporter, it fails the whole export.

      I doubt that validity of the limitation in WARCRecord, but regardless, I think WARCExporter should catch the exception and skip to the next record.

      (See also https://github.com/ept/warc-hadoop/issues/5)

      Attachments

        Activity

          People

            Unassigned Unassigned
            yossi Yossi Tamari
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: