Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-1391

Use sync() instead of syncFs() in HDFS Sink to be compatible with hadoop 0.20.2

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1.0
    • 1.3.0
    • Sinks+Sources

    Description

      For HDFS sink, the syncFs() is called in HDFSSequenceFile. But syncFs() is not available in legacy hadoop 0.20.2, which may be a widely used version. sync() method is available for all hadoop versions. And syncFs() is also implemented by sync() in hadoop (SequenceFile):

          /** create a sync point */
          public void sync() throws IOException {
            if (sync != null && lastSyncPos != out.getPos()) {
              out.writeInt(SYNC_ESCAPE);                // mark the start of the sync
              out.write(sync);                          // write sync
              lastSyncPos = out.getPos();               // update lastSyncPos
            }
          }
      
          /** flush all currently written data to the file system */
          public void syncFs() throws IOException {
            if (out != null) {
              out.sync();                               // flush contents to file system
            }
          }
      

      Therefore, using sync() in HDFSSequenceFile may be better.

        @Override
        public void sync() throws IOException {
          //writer.syncFs(); //for hadoop 0.20.205.0+
          writer.sync(); //support hadoop 0.20.2+
        }
      

      Attachments

        1. HDFSSink-for-hadoop-0.20.2.patch
          0.5 kB
          Yongkun Wang

        Issue Links

          Activity

            People

              yongkun Yongkun Wang
              yongkun Yongkun Wang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: