Thanks a lot for reporting this! It makes sense from reading the posts.
Could you open a JIRA? Are you interested in assigning to yourself and
contribute the fix?
Thanks a lot again!
> I am working on a project where we are integrating Samza and Hive. As part
> of this project, we ran into an issue where sequence files written from
> Samza were taking a long time (hours) to completely sync with HDFS.
> After some Googling and digging into the code, it appears that the issue
> is here:
> Writer.stream(dfs.create(path)) implies that the caller of
> dfs.create(path) is responsible for closing the created stream explicitly.
> This doesn't happen, and the SequenceFileHdfsWriter call to close will only
> flush the stream.
> I believe the correct line should be:
> Or, SequenceFileHdfsWriter should explicitly track and close the stream.
> Refernece material: