Description
The names of tar entries are currently encoded/decoded by means of plain 8bit conversions of byte to char and vice-versa. This prohibits the use of encodings like UTF8 in the file names. Whether the use of UTF8 (or any other non ASCII) in file names is sensible is a chapter of its own. However tar archives that contain files which names have been encoded with UTF8 do float around. These files currently can not be read correctly by commons-compress due to the encoding being hardcoded to plain 8BIT only.
The supplied patch allows to use encodings other than 8BIT using a TarArchiveCodec structure. It does not change the standard functionality, but adds to it the possibility of using a different encoding.
A method was added to the TarUtilsTest junit test to test the added functionality.
Attachments
Attachments
Issue Links
- is blocked by
-
COMPRESS-184 PAX header parser fails for non-ASCII values
- Resolved