[HADOOP-6097] Multiple bugs w/ Hadoop archives - ASF JIRA

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.20.1
Fix Version/s: 0.20.2
Component/s: fs
Labels:
None

Hadoop Flags:

Reviewed
Release Note:
Bugs fixed for Hadoop archives: character escaping in paths, LineReader and file system caching.

Found and fixed several bugs involving Hadoop archives:

In makeQualified(), the sloppy conversion from Path to URI and back mangles the path if it contains an escape-worthy character.

It's possible that fileStatusInIndex() may have to read more than one segment of the index. The LineReader and count of bytes read need to be reset for each block.

har:// connections cannot be indexed by (scheme, authority, username) – the path is significant as well. Caching them in this way limits a hadoop client to opening one archive per filesystem. It seems to be safe not to cache them, since they wrap another connection that does the actual networking.