ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-2003

Missing fsync() on the logs parent directory

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.4.6
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      After studying the steps ZooKeeper takes to update the logs we found the following bug. The bug may not manifest in the current file system implementations, but it violates the POSIX recommendations and may be an issue in some file systems.

      Looking at the strace of zookeeper we see the following:
      mkdir(v)
      create(v/log)
      append(v/log)
      trunk(v/log)
      write(v/log)
      fdatasync(v/log)

      Although the data is fdatasynced to the log, the parent directory was never fsynced, consequently in case of a crash, the parent directory or the log file may be lost, as the parent directory and file metadata were never persisted on disk.
      To be safe, both the log directory, and parent directory of the log directory should be fsynced as well.

        Activity

        Hide
        Raul Gutierrez Segales added a comment -

        To be safe, both the log directory, and parent directory of the log directory should be fsynced as well.

        That sounds like more than needed. We only need to fsync the parent directory when the logfile has just been created, every other append doesn't need an fsync of the parent dir (maybe you meant that, but the last statement seems to be overreaching).

        See: http://www.quora.com/Linux/When-should-you-fsync-the-containing-directory-in-addition-to-the-file-itself

        Show
        Raul Gutierrez Segales added a comment - To be safe, both the log directory, and parent directory of the log directory should be fsynced as well. That sounds like more than needed. We only need to fsync the parent directory when the logfile has just been created , every other append doesn't need an fsync of the parent dir (maybe you meant that, but the last statement seems to be overreaching). See: http://www.quora.com/Linux/When-should-you-fsync-the-containing-directory-in-addition-to-the-file-itself
        Hide
        Flavio Junqueira added a comment -

        Fair enough, let's go into more detail here. According to the analysis in the description, I read two potential problems:

        • The directory data of a newly created directory isn't persisted to disk
        • A newly created log file isn't persisted as a directory entry

        Both cases may lead to the loss of an otherwise persisted log file and they imply that we need to fsync the directory data. There is a third point that I believe is important, which is making sure that the metadata is updated when we pad the log file.

        The fsync documentation says that we need to fsync the directory as well to make sure that the directory change is persisted to disk. You claim that it is not possible to do this in java, but I think that with Java 7 we can fsync directories, no?

        It seems that there are two parts to this discussion. First, we need to understand to what extent this is really a problem in the current code. I must say that I haven't thought about this part of ZK in a while, so I don't have it entirely fresh. Second, assuming there is something to be fixed, we need to determine how to do it in Java.

        Show
        Flavio Junqueira added a comment - Fair enough, let's go into more detail here. According to the analysis in the description, I read two potential problems: The directory data of a newly created directory isn't persisted to disk A newly created log file isn't persisted as a directory entry Both cases may lead to the loss of an otherwise persisted log file and they imply that we need to fsync the directory data. There is a third point that I believe is important, which is making sure that the metadata is updated when we pad the log file. The fsync documentation says that we need to fsync the directory as well to make sure that the directory change is persisted to disk. You claim that it is not possible to do this in java, but I think that with Java 7 we can fsync directories, no? It seems that there are two parts to this discussion. First, we need to understand to what extent this is really a problem in the current code. I must say that I haven't thought about this part of ZK in a while, so I don't have it entirely fresh. Second, assuming there is something to be fixed, we need to determine how to do it in Java.
        Hide
        Liang Xie added a comment -

        Seems there's no JAVA api could do this, one possible solution is JNI.
        ps: It's still safe if using XFS but really need a directory fsync w/ ext4, in theory.
        Flavio Junqueira, i think you missed Samer's point...

        Show
        Liang Xie added a comment - Seems there's no JAVA api could do this, one possible solution is JNI. ps: It's still safe if using XFS but really need a directory fsync w/ ext4, in theory. Flavio Junqueira , i think you missed Samer's point...
        Hide
        Flavio Junqueira added a comment -

        I suppose this is because we force to disk with the metadata option set to false (FileTxnLog.commit):

        log.getChannel().force(false);
        

        Are you suggesting that we force to disk with the metadata option set to true upon creating a new file? We do flush the BufferedOutputStream associated to the FileOutputStream, but that only flushes the buffer of BOS to FOS, my understanding of the documentation is that it doesn't induce a force.

        Show
        Flavio Junqueira added a comment - I suppose this is because we force to disk with the metadata option set to false (FileTxnLog.commit): log.getChannel().force(false); Are you suggesting that we force to disk with the metadata option set to true upon creating a new file? We do flush the BufferedOutputStream associated to the FileOutputStream, but that only flushes the buffer of BOS to FOS, my understanding of the documentation is that it doesn't induce a force.
        Hide
        Liang Xie added a comment -

        nice finding!

        Show
        Liang Xie added a comment - nice finding!

          People

          • Assignee:
            Unassigned
            Reporter:
            Samer Al-Kiswany
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development