Repsonse to devaraj's comments -
> 1) The query part in the creation of the URI can be removed (in fact we probably should flag an error if the har path contains a '?' since it is not a valid Path)
> 2) decodeURI should be done first and then the har archive path can be extracted
> 3) getHarAuth needn't be parsing the uri everytime since it is constant. The auth can just be stored in a class variable.
> 4) open() & other filesystem calls should support taking just the fragment path to a file within the archive
> 5) why is fileStatusInIndex storing the Store object in a list while going through the master index? Isn't the list going to be always of size 1 (if the file is present in the archive)
no the list can be more than one since hashcode will have collisions and might end up between different buckets
> 6) the index files are not closed in the fileStatusInIndex call. This might lead to problems in the cases where the underlying filesystem is the localfs (where open actually returns a filedescriptor). But I am also not sure whether we should open and close on every call to fileStatusInIndex. Can we somehow cache the handles to the index files and reuse them.
for now I will just be closing and opening the files. Ill leave this optimization for later.
> 7) When we create a part file, can we record the things like replication factor, permissions, etc. and emit them just like we emit the other info like partfilename, etc. during archive creation and store them in the index file. That way we don't have to fake everything in the listStatus.
this was stated in the design that we will be ignoring permissions in this version. In later versions we can persist the permissions as well.
> 8) In listStatus, the start and end braces are missing for the if/else block
> In listStatus, the check hstatus.isDir()?0:hstatus.getLength() seems redundant. hstatus.isDir is always going to be false
> 10) I don't understand clearly why makeRelative is done in the listStatus and getFileStatus calls
Its just to join two paths that are assolute.
SO a path in archive is persisted as /user/mahadev so to have this as the later componet of the archive its just made relative to create a new Path
> Do you enforce the .har in the archive name when it is created?
> 1) In writeTopLevelDirs, remove the comment "invert the paths"
> 2) the SRC_LIST_LABEL file needs to have a high replication factor (maybe 10 or something)
> Use NullOutputFormat instead of the HarOutputFormat
Did not know we had a nulloutputformat!!
> 4) Overall the coding convention is that we have starting/terminating braces even for single statement blocks. Please update the code w.r.t that.
> Overall, the path manipulations (like makeRelative/absolute) confuses me. It'd be nice to cleanup the code in that aspect if possible.
will try to clean up the code.
> In the testcase, it'd be nice if you read the whole file content instead of just 4 bytes, and then validate. That'd tell you that there is no extra (spurious) bytes in the archived files, right?