Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: core, log
    • Labels:
    • Environment:
      Windows, Linux, Mac OS

      Description

      While I was studying how MappedByteBuffer works, I saw a sharing runtime exception on Windows. I applied what I learned to generate a patch which uses an internal open JDK API to solve this problem.

      Following Jay's advice, I made a helper method called tryUnmap().

      1. unmap-v5.patch
        12 kB
        Elizabeth Wei
      2. KAFKA-1008-v6.patch
        15 kB
        Jay Kreps
      3. KAFKA-trunk-1008-v7.patch
        32 kB
        Elizabeth Wei
      4. KAFKA-0.8-1008-v7.patch
        17 kB
        Jay Kreps
      5. KAFKA-0.8-1008-v8.patch
        18 kB
        Jay Kreps
      6. KAFKA-1008-v9-trunk.patch
        20 kB
        Jay Kreps

        Issue Links

          Activity

          Hide
          Elizabeth Wei added a comment -

          Patch file

          Show
          Elizabeth Wei added a comment - Patch file
          Hide
          Guozhang Wang added a comment -

          Will this patch restrict to Sun JVM-only environments?

          Show
          Guozhang Wang added a comment - Will this patch restrict to Sun JVM-only environments?
          Hide
          Elizabeth Wei added a comment -

          For non Sun JVM environments, tryUnmap is no-op for now; we can add support for other JVM environments later on.

          Show
          Elizabeth Wei added a comment - For non Sun JVM environments, tryUnmap is no-op for now; we can add support for other JVM environments later on.
          Hide
          Jay Kreps added a comment -

          Hey Elizabeth, thanks for the patch!

          Three follow-up items for us to take this:
          1. If we refer to sun.nio.ch.DirectBuffer that introduces a build-time dependency on Sun java. I think that is probably okay. But what is the behavior of tryUnmap on a non-sun jvm at runtime (or if sun ever changes their implementation)? My suspicion is that we would get a ClassNotFoundException for sun.nio.ch.DirectBuffer right? I think we would be better off wrapping everything inside tryUnmap inside a try/catch and trace logging if there is an exception.
          2. It looks like you added back in a call to flush which we removed as part of another patch. Probably accidental, right?
          3. I would like to understand the security issue on the Sun ticket for unmap here: http://bugs.sun.com/view_bug.do?bug_id=4724038 If we don't understand what the concern is then there is a possibility we are introducing a security problem (though I don't think so...).

          Show
          Jay Kreps added a comment - Hey Elizabeth, thanks for the patch! Three follow-up items for us to take this: 1. If we refer to sun.nio.ch.DirectBuffer that introduces a build-time dependency on Sun java. I think that is probably okay. But what is the behavior of tryUnmap on a non-sun jvm at runtime (or if sun ever changes their implementation)? My suspicion is that we would get a ClassNotFoundException for sun.nio.ch.DirectBuffer right? I think we would be better off wrapping everything inside tryUnmap inside a try/catch and trace logging if there is an exception. 2. It looks like you added back in a call to flush which we removed as part of another patch. Probably accidental, right? 3. I would like to understand the security issue on the Sun ticket for unmap here: http://bugs.sun.com/view_bug.do?bug_id=4724038 If we don't understand what the concern is then there is a possibility we are introducing a security problem (though I don't think so...).
          Hide
          Elizabeth Wei added a comment -

          Thanks for the feedback!

          1 - Currently tryUnmap does a type checking. If the super class of "m" is changed in the future, the type check will be false, and there will be no casting errors like ClassNotFoundException at runtime.

          2 - I removed the flush(). I probably used an older copy a couple of weeks ago.

          3 - Reading through the bug report, I'm not sure if the cases matter in terms of Kafka - I think all the threads in a Kafka process should be trusted and the race condition between the unmap/remap shouldn't happen if coded properly. I noticed that in the most recent version, the resize method is synchronized, which should prevent multiple threads trying to resize/unmap the files.

          Show
          Elizabeth Wei added a comment - Thanks for the feedback! 1 - Currently tryUnmap does a type checking. If the super class of "m" is changed in the future, the type check will be false, and there will be no casting errors like ClassNotFoundException at runtime. 2 - I removed the flush(). I probably used an older copy a couple of weeks ago. 3 - Reading through the bug report, I'm not sure if the cases matter in terms of Kafka - I think all the threads in a Kafka process should be trusted and the race condition between the unmap/remap shouldn't happen if coded properly. I noticed that in the most recent version, the resize method is synchronized, which should prevent multiple threads trying to resize/unmap the files.
          Hide
          Jay Kreps added a comment -

          Hey Elizabeth, thanks for the patch.

          Two issues.

          The first is that I think that instanceof check will actually throw an exception on a non-sun jvm. See the experiment below and see what you think.

          The second issue is actually a more subtle thing. Currently the synchronization is really just for changes to the buffer, the read-access are lock-free (which is good). My concern is what happens if we force clean a buffer while a read is occurring? Not sure if this can happen or not, but I think we need to somehow be sure it can't.

          Here was my test:
          $ cat /tmp/code/Test.java
          import test.MyTest;

          public class Test {
          public static void main(String[] args)

          { Object o = new Object(); if(o instanceof MyTest) System.out.println("Hello"); }

          }

          $ cat /tmp/code/test/MyTest.java
          package test;

          public class MyTest {

          }

          I do
          javac /tmp/code/test/MyTest.java
          javac -cp /tmp/code /tmp/code/Test.java
          rm /tmp/code/test/MyTest.class
          $ java -cp /tmp/code Test
          Exception in thread "main" java.lang.NoClassDefFoundError: test/MyTest
          at Test.main(Test.java:6)
          Caused by: java.lang.ClassNotFoundException: test.MyTest
          at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
          at java.security.AccessController.doPrivileged(Native Method)
          at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
          at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
          ... 1 more

          We also can't really be too sure what kind of exceptions clean() might throw.

          Show
          Jay Kreps added a comment - Hey Elizabeth, thanks for the patch. Two issues. The first is that I think that instanceof check will actually throw an exception on a non-sun jvm. See the experiment below and see what you think. The second issue is actually a more subtle thing. Currently the synchronization is really just for changes to the buffer, the read-access are lock-free (which is good). My concern is what happens if we force clean a buffer while a read is occurring? Not sure if this can happen or not, but I think we need to somehow be sure it can't. Here was my test: $ cat /tmp/code/Test.java import test.MyTest; public class Test { public static void main(String[] args) { Object o = new Object(); if(o instanceof MyTest) System.out.println("Hello"); } } $ cat /tmp/code/test/MyTest.java package test; public class MyTest { } I do javac /tmp/code/test/MyTest.java javac -cp /tmp/code /tmp/code/Test.java rm /tmp/code/test/MyTest.class $ java -cp /tmp/code Test Exception in thread "main" java.lang.NoClassDefFoundError: test/MyTest at Test.main(Test.java:6) Caused by: java.lang.ClassNotFoundException: test.MyTest at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 1 more We also can't really be too sure what kind of exceptions clean() might throw.
          Hide
          Elizabeth Wei added a comment -

          Thanks, I added the try/catch to handle the exceptions.

          I found the following coding pattern
          val idx = mmap.duplicate
          in this file. It seems like it always makes a copy of the buffer for "read".

          Show
          Elizabeth Wei added a comment - Thanks, I added the try/catch to handle the exceptions. I found the following coding pattern val idx = mmap.duplicate in this file. It seems like it always makes a copy of the buffer for "read".
          Hide
          Jay Kreps added a comment -

          But my understanding is that these copies are just copies of pointer object (i.e. a position, limit, mark, etc) so if you unmap the underlying mmap while reads are occurring something bad will happen.

          Another way to say this is what happens in the following code execution:
          MappedByteBuffer orig = // map file
          MappedByteBuffer copy = orig.duplicate()
          (orig.asInstanceOf[sun.nio.ch.DirectBuffer]).cleaner().clean()
          copy.get()

          My suspicion is that something terrible will happen, but I could be wrong.

          Show
          Jay Kreps added a comment - But my understanding is that these copies are just copies of pointer object (i.e. a position, limit, mark, etc) so if you unmap the underlying mmap while reads are occurring something bad will happen. Another way to say this is what happens in the following code execution: MappedByteBuffer orig = // map file MappedByteBuffer copy = orig.duplicate() (orig.asInstanceOf [sun.nio.ch.DirectBuffer] ).cleaner().clean() copy.get() My suspicion is that something terrible will happen, but I could be wrong.
          Hide
          Elizabeth Wei added a comment -

          That's right, the duplicate is a shallow copy of the buffer. Do you have a case where the buffer is being used while resizing? I did a quick grep, and it looks like resize is only being used during logfile loading or truncation.

          Show
          Elizabeth Wei added a comment - That's right, the duplicate is a shallow copy of the buffer. Do you have a case where the buffer is being used while resizing? I did a quick grep, and it looks like resize is only being used during logfile loading or truncation.
          Hide
          Jay Kreps added a comment -

          Yeah, the problem is reads continue while the log is being rolled.

          Here are a bunch of possible solutions I can think of
          1. Lock reads
          2. Delay the truncate until the larger mmap is collected

          For (2) the problem is that it actually interferes with the recoverability of the log. If we have old log segments with a bunch of unfilled bytes in their index segment we need to recover those segments. But currently we only recover from the last flush point.

          So I think we need to lock read access. This isn't the end of the world but I would prefer to do this only on windows. My suggestion on implementation would be
          1. Change from synchronized in OffsetIndex to Lock
          2. Add an object called Os in kafka.utils that is something like

          object Os

          { private val osName = System.getProperty("os.name").toLowerCase val isWindows = osName.startsWith("windows") }

          We can expand this later if we need more specific OS detection capabilities or other OS-specific functionality.

          Then elsewhere we can just do
          if(Os.isWindows) lock.lock()
          and the corresponding unlock.

          Does that seem like it would work?

          Show
          Jay Kreps added a comment - Yeah, the problem is reads continue while the log is being rolled. Here are a bunch of possible solutions I can think of 1. Lock reads 2. Delay the truncate until the larger mmap is collected For (2) the problem is that it actually interferes with the recoverability of the log. If we have old log segments with a bunch of unfilled bytes in their index segment we need to recover those segments. But currently we only recover from the last flush point. So I think we need to lock read access. This isn't the end of the world but I would prefer to do this only on windows. My suggestion on implementation would be 1. Change from synchronized in OffsetIndex to Lock 2. Add an object called Os in kafka.utils that is something like object Os { private val osName = System.getProperty("os.name").toLowerCase val isWindows = osName.startsWith("windows") } We can expand this later if we need more specific OS detection capabilities or other OS-specific functionality. Then elsewhere we can just do if(Os.isWindows) lock.lock() and the corresponding unlock. Does that seem like it would work?
          Hide
          Jay Kreps added a comment -

          Here is a list of os.name values (at least according to this random page):
          http://www.javaneverdie.com/java/java-os-name-property-values/

          Show
          Jay Kreps added a comment - Here is a list of os.name values (at least according to this random page): http://www.javaneverdie.com/java/java-os-name-property-values/
          Hide
          Elizabeth Wei added a comment -

          Sounds like a good idea to have lock reads. Do you want to file a separate JIRA to address the lock read issue?

          Show
          Elizabeth Wei added a comment - Sounds like a good idea to have lock reads. Do you want to file a separate JIRA to address the lock read issue?
          Hide
          Jay Kreps added a comment -

          I think it makes sense to do it as part of this patch since unmapping is the reason we need it and without it we will core dump the process under concurrency (so I don't think we can take the resizing change without the locking change). Sound reasonable?

          If we have the Os.isWindows check I think we can make BOTH the locking and the forced mmap clenaing all be inside the isWindows check. Thisshould fix things on windows and not functionally change perf at all on linux (which is good).

          Show
          Jay Kreps added a comment - I think it makes sense to do it as part of this patch since unmapping is the reason we need it and without it we will core dump the process under concurrency (so I don't think we can take the resizing change without the locking change). Sound reasonable? If we have the Os.isWindows check I think we can make BOTH the locking and the forced mmap clenaing all be inside the isWindows check. Thisshould fix things on windows and not functionally change perf at all on linux (which is good).
          Hide
          Elizabeth Wei added a comment -

          Thanks Jay!
          Please review the patch to see if it reflects your suggestion. I'm still learning Scala, so please provide any feedback!

          Show
          Elizabeth Wei added a comment - Thanks Jay! Please review the patch to see if it reflects your suggestion. I'm still learning Scala, so please provide any feedback!
          Hide
          Timothy Chen added a comment -

          I wonder if this patch can go in soon? It's a major blocker for anyone that wants to use Kafka on windows

          Show
          Timothy Chen added a comment - I wonder if this patch can go in soon? It's a major blocker for anyone that wants to use Kafka on windows
          Hide
          Jay Kreps added a comment -

          Hey Elizabeth, this basically looks good. A couple of minor things.

          One is I'm not sure we are covering everything that needs to be locked. The next is that we are using synchronized with the lock instance which doesn't actually call lock (as in java that just acquires the monitor associated with the lock argument--sucky right?).

          I took a stab at reworking it. To make things not get too crazy I added a helper method inLock which makes the try/finally locking pattern a little more readable (hopefully).

          Would you be willing to take a detailed look at this and let me know if you think this works.

          The next thing we need to do is actually get this tested on Windows. I'm not sure if there is anyone who has access to a Windows machine and could reproduce the old problem who could verify that this patch fixes it?

          Show
          Jay Kreps added a comment - Hey Elizabeth, this basically looks good. A couple of minor things. One is I'm not sure we are covering everything that needs to be locked. The next is that we are using synchronized with the lock instance which doesn't actually call lock (as in java that just acquires the monitor associated with the lock argument--sucky right?). I took a stab at reworking it. To make things not get too crazy I added a helper method inLock which makes the try/finally locking pattern a little more readable (hopefully). Would you be willing to take a detailed look at this and let me know if you think this works. The next thing we need to do is actually get this tested on Windows. I'm not sure if there is anyone who has access to a Windows machine and could reproduce the old problem who could verify that this patch fixes it?
          Hide
          Elizabeth Wei added a comment -

          The code change looks good. Using a higher order function makes the code look a lot cleaner!

          Show
          Elizabeth Wei added a comment - The code change looks good. Using a higher order function makes the code look a lot cleaner!
          Hide
          David Lao added a comment -

          Hi Jay,
          The patch does not seem to apply cleanly on the 0.8 branch (see below). Can you look into generating a new patch for 0.8?

          git apply --check KAFKA-1008-v6.patch
          error: patch failed: core/src/main/scala/kafka/log/OffsetIndex.scala:52
          error: core/src/main/scala/kafka/log/OffsetIndex.scala: patch does not apply
          error: patch failed: core/src/main/scala/kafka/utils/Utils.scala:21
          error: core/src/main/scala/kafka/utils/Utils.scala: patch does not apply
          error: patch failed: core/src/test/scala/unit/kafka/utils/UtilsTest.scala:18
          error: core/src/test/scala/unit/kafka/utils/UtilsTest.scala: patch does not apply

          Show
          David Lao added a comment - Hi Jay, The patch does not seem to apply cleanly on the 0.8 branch (see below). Can you look into generating a new patch for 0.8? git apply --check KAFKA-1008 -v6.patch error: patch failed: core/src/main/scala/kafka/log/OffsetIndex.scala:52 error: core/src/main/scala/kafka/log/OffsetIndex.scala: patch does not apply error: patch failed: core/src/main/scala/kafka/utils/Utils.scala:21 error: core/src/main/scala/kafka/utils/Utils.scala: patch does not apply error: patch failed: core/src/test/scala/unit/kafka/utils/UtilsTest.scala:18 error: core/src/test/scala/unit/kafka/utils/UtilsTest.scala: patch does not apply
          Hide
          Elizabeth Wei added a comment -

          I generated a patch against the trunk to fix some of the issues in Jay's patch. Please check if it works!

          Show
          Elizabeth Wei added a comment - I generated a patch against the trunk to fix some of the issues in Jay's patch. Please check if it works!
          Hide
          David Lao added a comment -

          Hi Jay,
          The master branch seems to be broken on Windows. Can you look into this? or produce a patch for 0.8?

          [2013-08-26 14:09:41,761] FATAL [Replica Manager on Broker 3]: Error writing to highwatermark file: (kafka.server.ReplicaManager)
          java.io.IOException: File rename from c:\Apps\logs\broker-3\replication-offset-checkpoint.tmp to c:\Apps\logs\broker-3\replication-offset-checkpoint failed.
          at kafka.server.OffsetCheckpoint.liftedTree1$1(OffsetCheckpoint.scala:61)
          at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:39)
          at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:312)
          at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:309)
          at scala.collection.immutable.Map$Map1.foreach(Map.scala:119)
          at kafka.server.ReplicaManager.checkpointHighWatermarks(ReplicaManager.scala:309)
          at kafka.server.ReplicaManager$$anonfun$startHighWaterMarksCheckPointThread$1.apply$mcV$sp(ReplicaManager.scala:92)
          at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
          at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
          at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
          at java.lang.Thread.run(Thread.java:722)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
          at java.lang.Thread.run(Thread.java:722)

          Show
          David Lao added a comment - Hi Jay, The master branch seems to be broken on Windows. Can you look into this? or produce a patch for 0.8? [2013-08-26 14:09:41,761] FATAL [Replica Manager on Broker 3] : Error writing to highwatermark file: (kafka.server.ReplicaManager) java.io.IOException: File rename from c:\Apps\logs\broker-3\replication-offset-checkpoint.tmp to c:\Apps\logs\broker-3\replication-offset-checkpoint failed. at kafka.server.OffsetCheckpoint.liftedTree1$1(OffsetCheckpoint.scala:61) at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:39) at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:312) at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:309) at scala.collection.immutable.Map$Map1.foreach(Map.scala:119) at kafka.server.ReplicaManager.checkpointHighWatermarks(ReplicaManager.scala:309) at kafka.server.ReplicaManager$$anonfun$startHighWaterMarksCheckPointThread$1.apply$mcV$sp(ReplicaManager.scala:92) at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722)
          Hide
          Jay Kreps added a comment -

          Attached a version rebased to 0.8.

          Show
          Jay Kreps added a comment - Attached a version rebased to 0.8.
          Hide
          Sriram Subramanian added a comment -

          Looks good.

          1. OffsetIndex.scala

          1.1 Why do you need to re-calculate this.maxEntries = this.mmap.limit / 8 after remapping in resize instead of leaving it how it was previously (recalculated during the method call of maxEntries)?
          1.2 maybeLock should be something that exist outside OffsetIndex. Seems like OS specific methods should reside together.
          1.3 readLastOffset now locks. This is a behavior change on linux. This seems to only do a read so does it need to be "maybeLock"?

          2. Should we use the autolock for the cases below

          TestUtils.scala

          method - waitUntilLeaderIsElectedOrChanged

          HighWaterCheckpoint.scala

          method - write and read

          3. Is autolock a better name than inlock?

          Show
          Sriram Subramanian added a comment - Looks good. 1. OffsetIndex.scala 1.1 Why do you need to re-calculate this.maxEntries = this.mmap.limit / 8 after remapping in resize instead of leaving it how it was previously (recalculated during the method call of maxEntries)? 1.2 maybeLock should be something that exist outside OffsetIndex. Seems like OS specific methods should reside together. 1.3 readLastOffset now locks. This is a behavior change on linux. This seems to only do a read so does it need to be "maybeLock"? 2. Should we use the autolock for the cases below TestUtils.scala method - waitUntilLeaderIsElectedOrChanged HighWaterCheckpoint.scala method - write and read 3. Is autolock a better name than inlock?
          Hide
          David Lao added a comment -

          Thanks Jay. The 0.8 patch seems to be working on Windows. Please check it in.

          Show
          David Lao added a comment - Thanks Jay. The 0.8 patch seems to be working on Windows. Please check it in.
          Hide
          Jun Rao added a comment -

          Thanks for the patch. Reviewed patch v7 for 0.8. Looks good overall. Just a couple of minor comments.

          70. OffsetIndex: Should forceUnmap() be private?

          71. Just a question. Does anyone know if the OS property shows up as windows on cygwin?

          Sriram,

          For 1.1, the purpose is probably to save the division calculation, which is a bit expensive.

          Show
          Jun Rao added a comment - Thanks for the patch. Reviewed patch v7 for 0.8. Looks good overall. Just a couple of minor comments. 70. OffsetIndex: Should forceUnmap() be private? 71. Just a question. Does anyone know if the OS property shows up as windows on cygwin? Sriram, For 1.1, the purpose is probably to save the division calculation, which is a bit expensive.
          Hide
          Neha Narkhede added a comment -

          ping Elizabeth Wei, Jay Kreps. Could you address Jun's review comments and see if we can resolve this JIRA? This is marked for the 0.8 final release

          Show
          Neha Narkhede added a comment - ping Elizabeth Wei , Jay Kreps . Could you address Jun's review comments and see if we can resolve this JIRA? This is marked for the 0.8 final release
          Hide
          Jay Kreps added a comment - - edited

          Sriram:
          1.1 This is because this.mmap can be null so you have to either acquire the lock. To avoid this I just make the max size a separate variable.
          1.2 It's actually only really useful in this class because the fact that we want to lock only on windows is very specific to the logic of this class.
          1.3 Yeah this is technically not necessary since this method is only used during initialization but I try never to have the correctness of methods depend on when they are used.

          2. Added a lock for the leader election test cases. I am skipping the other usage because that file was heavily refactored in trunk and that lock removed (I think).

          3. I intend
          inLock(x)

          { foo }

          to be read as "in lock x do foo". So I like it since it is declarative.

          Show
          Jay Kreps added a comment - - edited Sriram: 1.1 This is because this.mmap can be null so you have to either acquire the lock. To avoid this I just make the max size a separate variable. 1.2 It's actually only really useful in this class because the fact that we want to lock only on windows is very specific to the logic of this class. 1.3 Yeah this is technically not necessary since this method is only used during initialization but I try never to have the correctness of methods depend on when they are used. 2. Added a lock for the leader election test cases. I am skipping the other usage because that file was heavily refactored in trunk and that lock removed (I think). 3. I intend inLock(x) { foo } to be read as "in lock x do foo". So I like it since it is declarative.
          Hide
          Jay Kreps added a comment -

          Jun, added private on that method. Not sure about cygwin.

          Show
          Jay Kreps added a comment - Jun, added private on that method. Not sure about cygwin.
          Hide
          Sriram Subramanian added a comment -

          +1

          Show
          Sriram Subramanian added a comment - +1
          Hide
          Elizabeth Wei added a comment -

          Sorry - I am not as active on this at the moment. I am back at Exeter to start my junior year. I just did a quick test on my friend's computer:
          System.out.println(System.getProperty("os.name"));
          prints out "Windows 7" for both Windows cmd shell and cygwin/bash.

          This makes sense since cygwin is mainly a shell.

          Show
          Elizabeth Wei added a comment - Sorry - I am not as active on this at the moment. I am back at Exeter to start my junior year. I just did a quick test on my friend's computer: System.out.println(System.getProperty("os.name")); prints out "Windows 7" for both Windows cmd shell and cygwin/bash. This makes sense since cygwin is mainly a shell.
          Hide
          Jun Rao added a comment -

          Thanks for patch v8. Just one more comment.

          80. OffsetIndex: The patch synchronizes in readLastOffset(), is that necessary?

          Show
          Jun Rao added a comment - Thanks for patch v8. Just one more comment. 80. OffsetIndex: The patch synchronizes in readLastOffset(), is that necessary?
          Hide
          Jay Kreps added a comment -

          Sriram had the same comment. It is possible to reason that the existing ways the method is used don't need synchronization but I don't think the method is thread safe since it depends on both size and mmap both of which can change (so e.g. mmap could be null and a truncate call could theoretically interleave with this call). I don't think it is very safe to have methods whose correctness depends on the existing call pattern. The synchronization doesn't hurt, in any case since this is not in any critical read or write path.

          Show
          Jay Kreps added a comment - Sriram had the same comment. It is possible to reason that the existing ways the method is used don't need synchronization but I don't think the method is thread safe since it depends on both size and mmap both of which can change (so e.g. mmap could be null and a truncate call could theoretically interleave with this call). I don't think it is very safe to have methods whose correctness depends on the existing call pattern. The synchronization doesn't hurt, in any case since this is not in any critical read or write path.
          Hide
          Jun Rao added a comment -

          Sorry, I missed that comment. +1 on v8.

          Show
          Jun Rao added a comment - Sorry, I missed that comment. +1 on v8.
          Hide
          Jay Kreps added a comment -

          Attached KAFKA-1008-v9-trunk.patch which ports the final 0.8 patch to trunk and also adds a fix for the windows compatibility issue in KAFKA-1036 in OffsetCheckpoint.scala.

          Show
          Jay Kreps added a comment - Attached KAFKA-1008 -v9-trunk.patch which ports the final 0.8 patch to trunk and also adds a fix for the windows compatibility issue in KAFKA-1036 in OffsetCheckpoint.scala.
          Hide
          Jun Rao added a comment -

          Thanks for the patch for trunk. +1.

          Show
          Jun Rao added a comment - Thanks for the patch for trunk. +1.
          Hide
          Jay Kreps added a comment -

          Checked in on 0.8 and trunk.

          Show
          Jay Kreps added a comment - Checked in on 0.8 and trunk.

            People

            • Assignee:
              Jay Kreps
              Reporter:
              Elizabeth Wei
            • Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development