Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
None
Description
The WARC parser works for non GZipped WARC files, but for GZipped WARC files it appears not all embedded files are being identified.
Processing a WARC.GZ file should return identical JSON output as the plain WARC file, with the addition of the GZ file metadata. However, in the attached JSON outputs, the JPEG present in the plain WARC file is not represented in the WARC.GZ.json file.
Additionally, the warc: metadata is not being returned for all files, although this may be by design.
Attached are two JSON files, one for the GZipped WARC file and one for the plain WARC file. And the two original files.