[SAMZA-968] SequenceFileHdfsFileWriter does not close file properly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.10.0, 0.10.1
Fix Version/s: 0.10.1
Component/s: container
Labels:
None

Flags:

Patch

Description

From dev@samza.apache.org:

Hi, Benjamin,

Thanks a lot for reporting this! It makes sense from reading the posts.
Could you open a JIRA? Are you interested in assigning to yourself and
contribute the fix?

Thanks a lot again!

-Yi

> Hello,
>
> I am working on a project where we are integrating Samza and Hive. As part
> of this project, we ran into an issue where sequence files written from
> Samza were taking a long time (hours) to completely sync with HDFS.
>
> After some Googling and digging into the code, it appears that the issue
> is here:
>
> https://github.com/apache/samza/blob/master/samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala#L111
>
> Writer.stream(dfs.create(path)) implies that the caller of
> dfs.create(path) is responsible for closing the created stream explicitly.
> This doesn't happen, and the SequenceFileHdfsWriter call to close will only
> flush the stream.
>
> I believe the correct line should be:
>
> Writer.file(path)
>
> Or, SequenceFileHdfsWriter should explicitly track and close the stream.
>
> Thanks!
>
> Ben
>
> Refernece material:
>
> http://stackoverflow.com/questions/27916872/why-the-sequencefile-is-truncated
>
> https://apache.googlesource.com/hadoop-common/+/HADOOP-6685/src/java/org/apache/hadoop/io/SequenceFile.java#1238

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SAMZA-968.patch
16/Jun/16 19:05
0.8 kB
Benjamin Smith

Activity

People

Assignee:: Benjamin Smith

Reporter:: Benjamin Smith

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 16/Jun/16 17:44

Updated:: 15/Jul/16 23:44

Resolved:: 21/Jun/16 22:37

Time Tracking

Estimated:

24h

Remaining:

24h

Logged:

Not Specified