Details
-
Task
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
Junrar is great and doesn't require any external dependencies. However, it doesn't handle rar v5. I've tried UNRAR 5.61 beta 1 freeware on some of the v5 files that we have in our regression corpus, and I can confirm that Tika is not able to handle them, but unrar is.
The parser would need to create a temporary directory, copy the inputstream there to a file, run unrar, process the extracted files and then clean up the directory.
We can get full path information from the l command: unrar l blah.rar
We can tell unrar not to overwrite files with the same name: unrar e or bug_trackers/LIBRE_OFFICE/131138-137877/LIBRE_OFFICE-135119-0.rar.
If we trust unrar to protect against path traversal (e.g. an embedded file with the name "../../../something_bad.pdf"), we can use the x command.