Kafka
  1. Kafka
  2. KAFKA-1036

Unable to rename replication offset checkpoint in windows

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.8.1
    • Fix Version/s: 0.8.1
    • Component/s: None
    • Labels:
    • Environment:
      windows

      Description

      Although there was a fix for checkpoint file renaming in windows that tries to delete the existing checkpoint file if renamed failed, I'm still seeing renaming errors on windows even though the destination file doesn't exist.

      A bit investigation shows that it wasn't able to rename the file since the kafka jvm still holds a fie lock on the tmp file and wasn't able to rename it.

      Attaching a patch that calls a explict writer.close so it can release the lock and can able to rename it.

        Activity

        Hide
        Jun Rao added a comment -

        Hi, Tim,

        Do you still want to provide a patch? I saw a patch attached and then deleted.

        Thanks,

        Show
        Jun Rao added a comment - Hi, Tim, Do you still want to provide a patch? I saw a patch attached and then deleted. Thanks,
        Hide
        Timothy Chen added a comment -

        Hi Jun, I just realized I don't have clearance to provide a patch yet. It will be much easier if you can help fix this since it's just a one line fix.

        Show
        Timothy Chen added a comment - Hi Jun, I just realized I don't have clearance to provide a patch yet. It will be much easier if you can help fix this since it's just a one line fix.
        Hide
        Jun Rao added a comment -

        Hmm, not sure how to patch this since we close the writer before renaming the file.

        Show
        Jun Rao added a comment - Hmm, not sure how to patch this since we close the writer before renaming the file.
        Hide
        Jay Kreps added a comment -

        I'm a little confused. I don't see any file locking happening in our code. The lock I see is just an in-memory lock and should prevent the file from being deleted.

        So perhaps the problem you are describing is that we don't close the file until after the file move? This is legit in unix but perhaps not in windows.

        Show
        Jay Kreps added a comment - I'm a little confused. I don't see any file locking happening in our code. The lock I see is just an in-memory lock and should prevent the file from being deleted. So perhaps the problem you are describing is that we don't close the file until after the file move? This is legit in unix but perhaps not in windows.
        Hide
        Jun Rao added a comment -

        This seems to be only affecting trunk. So, moving to 0.8.1.

        Show
        Jun Rao added a comment - This seems to be only affecting trunk. So, moving to 0.8.1.
        Hide
        Jay Kreps added a comment -

        Checked in fix on trunk. Note that I don't have access to a windows box so I can't actually validate the fix if anyone who does have access gave this a spin that would be great.

        Show
        Jay Kreps added a comment - Checked in fix on trunk. Note that I don't have access to a windows box so I can't actually validate the fix if anyone who does have access gave this a spin that would be great.
        Hide
        Jan added a comment -

        Hi Jay,

        I just stumbled upon this issue, since I am on Windows. I just checked out the trunk since I had this problem with the current beta due to this issue. But I still face the same issue:
        [2013-10-25 12:43:43,422] FATAL [Replica Manager on Broker 0]: Error writing to
        highwatermark file: (kafka.server.ReplicaManager)
        java.io.IOException: File rename from D:\Databases\Kafka\kafka-logs\replication-
        offset-checkpoint.tmp to D:\Databases\Kafka\kafka-logs\replication-offset-checkpoint failed.

        I also tried to use nio move functionality to see if that solved the problem, but fails for the same reason

        Thanks a lot and regards

        Jan

        Show
        Jan added a comment - Hi Jay, I just stumbled upon this issue, since I am on Windows. I just checked out the trunk since I had this problem with the current beta due to this issue. But I still face the same issue: [2013-10-25 12:43:43,422] FATAL [Replica Manager on Broker 0] : Error writing to highwatermark file: (kafka.server.ReplicaManager) java.io.IOException: File rename from D:\Databases\Kafka\kafka-logs\replication- offset-checkpoint.tmp to D:\Databases\Kafka\kafka-logs\replication-offset-checkpoint failed. I also tried to use nio move functionality to see if that solved the problem, but fails for the same reason Thanks a lot and regards Jan
        Hide
        David Lao added a comment -

        Hi Jay, Can you provide a patch for 0.8 as well? I'm running into similar issue on Windows.

        Show
        David Lao added a comment - Hi Jay, Can you provide a patch for 0.8 as well? I'm running into similar issue on Windows.
        Hide
        Timothy Chen added a comment -

        Hi Jay,

        The code isn't doing any locking, but looks like in Windows if you don't close the writer there is still a pending file lock on the file itself in Windows looking via the file monitor.

        That's why I needed to add a extra writer.close after the rename fails.

        Tim

        Show
        Timothy Chen added a comment - Hi Jay, The code isn't doing any locking, but looks like in Windows if you don't close the writer there is still a pending file lock on the file itself in Windows looking via the file monitor. That's why I needed to add a extra writer.close after the rename fails. Tim
        Hide
        Jan added a comment -

        Hi all,

        I think the problem is that the second check for renameTo == true fails although the rename was executed properly. When I remove the second check, it works without problems and the file gets renamed properly (see the attached patch). I guess the root cause of this problem is the platform dependency of the old File API:
        Many aspects of the behavior of this method are inherently platform-dependent: The rename operation might not be able to move a file from one filesystem to another, it might not be atomic, and it might not succeed if a file with the destination abstract pathname already exists

        Maybe it would be a solution to use NIO instead?

        Best regards

        Jan

        Show
        Jan added a comment - Hi all, I think the problem is that the second check for renameTo == true fails although the rename was executed properly. When I remove the second check, it works without problems and the file gets renamed properly (see the attached patch). I guess the root cause of this problem is the platform dependency of the old File API: Many aspects of the behavior of this method are inherently platform-dependent: The rename operation might not be able to move a file from one filesystem to another, it might not be atomic, and it might not succeed if a file with the destination abstract pathname already exists Maybe it would be a solution to use NIO instead? Best regards Jan
        Hide
        Jan added a comment -

        The second check for renaming fails on windows, although the renaming worked.

        Show
        Jan added a comment - The second check for renaming fails on windows, although the renaming worked.
        Hide
        Neha Narkhede added a comment -

        Jan Are you sure this is required? If we always delete the destination file and then execute renameTo, it should work in all cases, no? Sriram Subramanian What do you think?

        Show
        Neha Narkhede added a comment - Jan Are you sure this is required? If we always delete the destination file and then execute renameTo, it should work in all cases, no? Sriram Subramanian What do you think?
        Hide
        Jun Rao added a comment -

        We are relying on file renaming being an atomic operation. So, if supported, we should still do rename, instead of deletion followed by creation. The issue with the latter is that if the broker crashes btw the two operations, the broker is left with no checkpoint file.

        Show
        Jun Rao added a comment - We are relying on file renaming being an atomic operation. So, if supported, we should still do rename, instead of deletion followed by creation. The issue with the latter is that if the broker crashes btw the two operations, the broker is left with no checkpoint file.

          People

          • Assignee:
            Unassigned
            Reporter:
            Timothy Chen
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development