Nicholas, yes I would like to store this information and not apply it to the part files (as you pointed out, this would probably lead to inconsistencies). However, I would like the correct ownership, permission, and times to show up when someone does an ls on a har:// path. Currently, the code creates FileStatus objects taking part of the information (e.g., size) from inside index file and the rest (replication, owner, times) from the properties of the index file itself, which is not correct. Currently, if you do an ls on a har:// path, the contents will show up as having a replication factor of 10 (the default for the index file) although the part file containing the data will probably have replication factor of 3 (hdfs default).
Keeping the properties does not really prevent someone who has access to the har directory and files to acess the part file that contains the data, but it would help a lot if we wanted to unhar the files at some point and keep their original properties. That's exactly what I'm proposing in this JIRA.
Shortly, I would like to store the properties in the index file, list them on an ls command, and return them correctly on getFileStatus(), not much more than that. I think this would be a good start for future and more complicated extensions.