Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-968

SequenceFileHdfsFileWriter does not close file properly

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.10.0, 0.10.1
    • Fix Version/s: 0.10.1
    • Component/s: container
    • Labels:
      None
    • Flags:
      Patch

      Description

      From dev@samza.apache.org:

      Hi, Benjamin,

      Thanks a lot for reporting this! It makes sense from reading the posts.
      Could you open a JIRA? Are you interested in assigning to yourself and
      contribute the fix?

      Thanks a lot again!

      -Yi

      > Hello,
      >
      > I am working on a project where we are integrating Samza and Hive. As part
      > of this project, we ran into an issue where sequence files written from
      > Samza were taking a long time (hours) to completely sync with HDFS.
      >
      > After some Googling and digging into the code, it appears that the issue
      > is here:
      >
      > https://github.com/apache/samza/blob/master/samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala#L111
      >
      > Writer.stream(dfs.create(path)) implies that the caller of
      > dfs.create(path) is responsible for closing the created stream explicitly.
      > This doesn't happen, and the SequenceFileHdfsWriter call to close will only
      > flush the stream.
      >
      > I believe the correct line should be:
      >
      > Writer.file(path)
      >
      > Or, SequenceFileHdfsWriter should explicitly track and close the stream.
      >
      > Thanks!
      >
      > Ben
      >
      > Refernece material:
      >
      > http://stackoverflow.com/questions/27916872/why-the-sequencefile-is-truncated
      >
      > https://apache.googlesource.com/hadoop-common/+/HADOOP-6685/src/java/org/apache/hadoop/io/SequenceFile.java#1238

        Attachments

        1. SAMZA-968.patch
          0.8 kB
          Benjamin Smith

          Activity

            People

            • Assignee:
              MLBenjii Benjamin Smith
              Reporter:
              MLBenjii Benjamin Smith
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified