Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-8390

The process cannot access the file because it is being used by another process

    Details

      Description

      21:46:27.810 [NonPeriodicTasks:1] ERROR o.a.c.service.CassandraDaemon - Exception in thread Thread[NonPeriodicTasks:1,5,main]
      org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db: The process cannot access the file because it is being used by another process.
       
                      at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) ~[cassandra-all-2.1.1.jar:2.1.1]
                      at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) ~[cassandra-all-2.1.1.jar:2.1.1]
                      at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) ~[cassandra-all-2.1.1.jar:2.1.1]
                      at org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94) ~[cassandra-all-2.1.1.jar:2.1.1]
                      at org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) ~[cassandra-all-2.1.1.jar:2.1.1]
                      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_71]
                      at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_71]
                      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) ~[na:1.7.0_71]
                      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) ~[na:1.7.0_71]
                      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_71]
                      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
                      at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
      Caused by: java.nio.file.FileSystemException: E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db: The process cannot access the file because it is being used by another process.
       
                      at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[na:1.7.0_71]
                      at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[na:1.7.0_71]
                      at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) ~[na:1.7.0_71]
                      at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) ~[na:1.7.0_71]
                      at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) ~[na:1.7.0_71]
                      at java.nio.file.Files.delete(Files.java:1079) ~[na:1.7.0_71]
                      at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) ~[cassandra-all-2.1.1.jar:2.1.1]
                      ... 11 common frames omitted
      1. CassandraDiedWithDiskAccessModeStandardLogs.7z
        1.67 MB
        Wei Yuan Cho
      2. FSD.PNG
        40 kB
        Wei Yuan Cho
      3. NoHostAvailableLogs.zip
        5.99 MB
        Alexander Radzin
      4. recreate-CASSANDRA-8390.tgz
        2 kB
        Amichai Rothman

        Issue Links

          Activity

          Hide
          philipthompson Philip Thompson added a comment -

          Are you having this problem running Apache Cassandra on Windows?

          Show
          philipthompson Philip Thompson added a comment - Are you having this problem running Apache Cassandra on Windows?
          Hide
          ilya.komolkin Ilya Komolkin added a comment -

          Yes. Windows 8.1

          Show
          ilya.komolkin Ilya Komolkin added a comment - Yes. Windows 8.1
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          This might be due to mmap'ed index file access - Windows has some known issues with deleting files that have actively mmap'ed segments live in memory (see CASSANDRA-6993). Given your data file deleted correctly and access violation hit on index, this is the first thing that jumps out to me. If you see the following in your system.log:"DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap", adding the following to your conf\cassandra.yaml file might fix this:
          disk_access_mode: standard

          Please let me know if the above addresses this and I'll push for the 6993 backport to make it into the next 2.X release.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - This might be due to mmap'ed index file access - Windows has some known issues with deleting files that have actively mmap'ed segments live in memory (see CASSANDRA-6993 ). Given your data file deleted correctly and access violation hit on index, this is the first thing that jumps out to me. If you see the following in your system.log:"DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap", adding the following to your conf\cassandra.yaml file might fix this: disk_access_mode: standard Please let me know if the above addresses this and I'll push for the 6993 backport to make it into the next 2.X release.
          Hide
          ilya.komolkin Ilya Komolkin added a comment -

          I have searched my System and Application logs - no such events there.

          Show
          ilya.komolkin Ilya Komolkin added a comment - I have searched my System and Application logs - no such events there.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          You should be able to find "DiskAccessMode" in your system.log file for cassandra - not talking eventvwr.msc / Event Viewer.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - You should be able to find "DiskAccessMode" in your system.log file for cassandra - not talking eventvwr.msc / Event Viewer.
          Hide
          alexander_radzin Alexander Radzin added a comment - - edited

          I have the same issue with Windows 8. Here is the "DiskAccessMode" line that I found in system.log of cassandra:

          INFO  [main] 2014-12-09 16:07:25,985 DatabaseDescriptor.java:203 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
          

          I have several lines like this.

          Important: when this happens client gets NoHostAvailableException and stops working that requires restart of cassandra.

          com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
          	at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
          	at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:259)
          	at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:175)
          	at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
          	at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36)
          	at com.clarisite.clingine.dataaccesslayer.cassandra.CQLTest1.cqlSync(CQLTest1.java:56)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
          	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
          	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
          	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
          	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
          	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
          	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
          	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
          	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
          	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
          	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
          	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
          	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
          	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
          	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
          	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:202)
          	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:65)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121)
          Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
          	at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:102)
          	at com.datastax.driver.core.SessionManager.execute(SessionManager.java:461)
          	at com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:497)
          	at com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:87)
          	... 34 more
          

          Here are the conditions that make this problem to be reproduced.
          Our application creates new keyspace every day. The keyspace contains about 60 tables. The issue happened on production relatively seldom, however it happens in testing environment all the time because each test case creates keyspace again. I guess that the problem is not specifically in creating keyspace and tables because sometimes the problem happens when trying to run truncate.

          Cassandra DB is running using default settings. The client code looks like the following:

          		Cluster cluster = Cluster.builder().addContactPoint("localhost").build();
          		Session session = cluster.connect();
          		String year = "2013";
          		for (int i = 1; i <= 12; i++) {
          			String yearMonth = year + i;
          			for (String template : cql.split("\\n")) {
          				String query = String.format(template, yearMonth);
          				System.out.println(query);
          				session.execute(query);
          			}
          		}
          

          Where cql contains create keyspace and a lot of create table statements.

          Interesting fact is that problem does not appear when using asynchronous call:

          		Collection<ResultSetFuture> futures = new ArrayList<>();
          
          		Cluster cluster = Cluster.builder().addContactPoint("localhost").build();
          		Session session = cluster.connect();
          		String year = "2013";
          		for (int i = 1; i <= 1200; i++) {
          			String yearMonth = year + i;
          			for (String template : cql.split("\\n")) {
          				String query = String.format(template, yearMonth);
          				System.out.println(query);
          				ResultSetFuture future = session.executeAsync(query);
          				futures.add(future);
          			}
          		}
          
          		Futures.successfulAsList(futures).get();
          

          Although this can be a temporary workaround I will try to use the problem itself is IMHO extremely critical.

          Full source code can be found here.

          Show
          alexander_radzin Alexander Radzin added a comment - - edited I have the same issue with Windows 8. Here is the "DiskAccessMode" line that I found in system.log of cassandra: INFO [main] 2014-12-09 16:07:25,985 DatabaseDescriptor.java:203 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap I have several lines like this. Important: when this happens client gets NoHostAvailableException and stops working that requires restart of cassandra. com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:259) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:175) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36) at com.clarisite.clingine.dataaccesslayer.cassandra.CQLTest1.cqlSync(CQLTest1.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:202) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:65) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:102) at com.datastax.driver.core.SessionManager.execute(SessionManager.java:461) at com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:497) at com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:87) ... 34 more Here are the conditions that make this problem to be reproduced. Our application creates new keyspace every day. The keyspace contains about 60 tables. The issue happened on production relatively seldom, however it happens in testing environment all the time because each test case creates keyspace again. I guess that the problem is not specifically in creating keyspace and tables because sometimes the problem happens when trying to run truncate . Cassandra DB is running using default settings. The client code looks like the following: Cluster cluster = Cluster.builder().addContactPoint("localhost").build(); Session session = cluster.connect(); String year = "2013"; for (int i = 1; i <= 12; i++) { String yearMonth = year + i; for (String template : cql.split("\\n")) { String query = String.format(template, yearMonth); System.out.println(query); session.execute(query); } } Where cql contains create keyspace and a lot of create table statements. Interesting fact is that problem does not appear when using asynchronous call: Collection<ResultSetFuture> futures = new ArrayList<>(); Cluster cluster = Cluster.builder().addContactPoint("localhost").build(); Session session = cluster.connect(); String year = "2013"; for (int i = 1; i <= 1200; i++) { String yearMonth = year + i; for (String template : cql.split("\\n")) { String query = String.format(template, yearMonth); System.out.println(query); ResultSetFuture future = session.executeAsync(query); futures.add(future); } } Futures.successfulAsList(futures).get(); Although this can be a temporary workaround I will try to use the problem itself is IMHO extremely critical. Full source code can be found here .
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Alexander Radzin: have you tried "disk_access_mode: standard" in your test environment to see if it resolves this issue? (see CASSANDRA-6993)

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Alexander Radzin : have you tried "disk_access_mode: standard" in your test environment to see if it resolves this issue? (see CASSANDRA-6993 )
          Hide
          alexander_radzin Alexander Radzin added a comment - - edited

          Joshua McKenzie, I tried this with the same result. I validated that the parameter was indeed used by cassandra by changing value to something illegal. This threw exception.

          Show
          alexander_radzin Alexander Radzin added a comment - - edited Joshua McKenzie, I tried this with the same result. I validated that the parameter was indeed used by cassandra by changing value to something illegal. This threw exception.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Thanks for the info Alexander - I'll try and reproduce locally w/the gist you linked. Also - thanks for the reproduction! Those are very helpful in cases like this.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Thanks for the info Alexander - I'll try and reproduce locally w/the gist you linked. Also - thanks for the reproduction! Those are very helpful in cases like this.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Alexander Radzin: how many runs does it take with the attached cqlSync test to reproduce? Also - are you running 2.1.1 or 2.1.2?

          I'm thus far unable to reproduce on either win7 or win8.1 with cqlSync, with or without memory-mapped index files.

          Also - is there an antivirus client installed in this test environment? We've seen issues w/file access violations on Windows in the past due to that as well.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Alexander Radzin : how many runs does it take with the attached cqlSync test to reproduce? Also - are you running 2.1.1 or 2.1.2? I'm thus far unable to reproduce on either win7 or win8.1 with cqlSync, with or without memory-mapped index files. Also - is there an antivirus client installed in this test environment? We've seen issues w/file access violations on Windows in the past due to that as well.
          Hide
          alexander_radzin Alexander Radzin added a comment -

          I have cassandra 2.1.2, windowss 8.1. Typically it takes 1.5 iterations of main loop to reproduce the problem with cqlSync. As a windows 8 user I have Windows Defender. It is turned on.

          I have just ran the test again and it passed. Then I changed the year from 2013 to 2014 and ran the test again. It failed when arrived to month 11.

          CREATE TABLE measure_201411.index_bcon_page_load_aggregation (partition ascii, attr ascii, value varchar, time timeuuid, bloom blob, PRIMARY KEY (partition, attr, value, time)) WITH compaction = { 'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 40, 'max_threshold' : 45 } AND gc_grace_seconds = 0 AND memtable_flush_period_in_ms = 300000;
          CREATE TABLE measure_201411.bcon_page_event_aggregation (partition ascii, time timeuuid, data blob, PRIMARY KEY (partition, time)) WITH compaction = { 'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 40, 'max_threshold' : 45 } AND gc_grace_seconds = 0 AND memtable_flush_period_in_ms = 300000;
          
          com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.TransportException: [localhost/127.0.0.1:9042] Error writing: Closed channel))
          	at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
          	at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:259)
          	at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:175)
          	at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
          	at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36)
          	at com.clarisite.clingine.dataaccesslayer.cassandra.CQLTest.cqlSync(CQLTest.java:32)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
          	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
          	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
          	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
          	at com.intellij.junit4.JUnit4TestRunnerUtil$IgnoreIgnoredTestJUnit4ClassRunner.runChild(JUnit4TestRunnerUtil.java:269)
          	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
          	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
          	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
          	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
          	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
          	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
          	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
          	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
          	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
          	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:202)
          	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:65)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121)
          Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.TransportException: [localhost/127.0.0.1:9042] Error writing: Closed channel))
          	at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:102)
          	at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:176)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          	at java.lang.Thread.run(Thread.java:744)
          

          I believe that probably number of keyspaces is important here. Try to increase the loop limit and run again. I hope you will reproduce the issue.

          Please pay the attention that in the beginning the test runs fast but than I feel some slowness. After that the test fails.
          Good luck.

          Show
          alexander_radzin Alexander Radzin added a comment - I have cassandra 2.1.2, windowss 8.1. Typically it takes 1.5 iterations of main loop to reproduce the problem with cqlSync. As a windows 8 user I have Windows Defender. It is turned on. I have just ran the test again and it passed. Then I changed the year from 2013 to 2014 and ran the test again. It failed when arrived to month 11. CREATE TABLE measure_201411.index_bcon_page_load_aggregation (partition ascii, attr ascii, value varchar, time timeuuid, bloom blob, PRIMARY KEY (partition, attr, value, time)) WITH compaction = { 'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 40, 'max_threshold' : 45 } AND gc_grace_seconds = 0 AND memtable_flush_period_in_ms = 300000; CREATE TABLE measure_201411.bcon_page_event_aggregation (partition ascii, time timeuuid, data blob, PRIMARY KEY (partition, time)) WITH compaction = { 'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 40, 'max_threshold' : 45 } AND gc_grace_seconds = 0 AND memtable_flush_period_in_ms = 300000; com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.TransportException: [localhost/127.0.0.1:9042] Error writing: Closed channel)) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:259) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:175) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36) at com.clarisite.clingine.dataaccesslayer.cassandra.CQLTest.cqlSync(CQLTest.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at com.intellij.junit4.JUnit4TestRunnerUtil$IgnoreIgnoredTestJUnit4ClassRunner.runChild(JUnit4TestRunnerUtil.java:269) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:202) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:65) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.TransportException: [localhost/127.0.0.1:9042] Error writing: Closed channel)) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:102) at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) I believe that probably number of keyspaces is important here. Try to increase the loop limit and run again. I hope you will reproduce the issue. Please pay the attention that in the beginning the test runs fast but than I feel some slowness. After that the test fails. Good luck.
          Hide
          alexander_radzin Alexander Radzin added a comment -

          FYI: I have Windows Defender on my computer. I have just tried to turn it off and got the same result.

          Show
          alexander_radzin Alexander Radzin added a comment - FYI: I have Windows Defender on my computer. I have just tried to turn it off and got the same result.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Upped to 40 iterations, ran 3 times w/out AV and then put windows defender on the box, activated, mmap on index files and ran up to 120 iterations and still can't reproduce. I'm on spinning disk here w/some other factors so that may play into it if this is a timing issue.

          The failure to delete within SSTableDeletingTask after successfully deleting the data file is fairly unique to this ticket as far as I know, and while it could indicate logic errors in SSTR.tidy / SSTR.scheduleTidy w/regards to unmapping mmap'ed segments, I'd expect to see this show up with more frequency if it was a logical error such as that.

          Something else on that system has a handle to that file open and we can't control the ecosystem outside Cassandra which has shown to be a headache on Windows. If either of you could reproduce and run Handle on your system, preferably with admin credentials on the command-prompt, and search for one of these files you're getting an error on it may help us narrow this down.

          I haven't tried running from within Upsource - Alexander Radzin: are you running C* from within upsource as well or running it natively from cassandra.bat?

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Upped to 40 iterations, ran 3 times w/out AV and then put windows defender on the box, activated, mmap on index files and ran up to 120 iterations and still can't reproduce. I'm on spinning disk here w/some other factors so that may play into it if this is a timing issue. The failure to delete within SSTableDeletingTask after successfully deleting the data file is fairly unique to this ticket as far as I know, and while it could indicate logic errors in SSTR.tidy / SSTR.scheduleTidy w/regards to unmapping mmap'ed segments, I'd expect to see this show up with more frequency if it was a logical error such as that. Something else on that system has a handle to that file open and we can't control the ecosystem outside Cassandra which has shown to be a headache on Windows. If either of you could reproduce and run Handle on your system, preferably with admin credentials on the command-prompt, and search for one of these files you're getting an error on it may help us narrow this down. I haven't tried running from within Upsource - Alexander Radzin : are you running C* from within upsource as well or running it natively from cassandra.bat?
          Hide
          alexander_radzin Alexander Radzin added a comment -

          Joshua McKenzie, thank you for your efforts.

          Today I managed to reproduce the issue on other developer's laptop, however everything worked well on integration machine. Laptops are running windows 7 and 8, integration machine windows 2002, R2.

          I played with sys internals and saw that there are 2 processes that act in cassandra data directory: windows defender and SVN. I stopped both and ran test again with process monitor from sysinternals and the problem happened again although the view of process monitor was clear (i.e. only java.exe) was active in this directory.

          Following your suggestion I ran the test again and executed hanle while test was running. And here are the results:

          C:\Windows\system32>c:\progs\SysinternalsSuite\handle -a c:\progs\apache-cassandra-2.1.2\data
          
          Handle v3.51
          Copyright (C) 1997-2013 Mark Russinovich
          Sysinternals - www.sysinternals.com
          
          java.exe           pid: 1852   type: File           658: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log
          java.exe           pid: 1852   type: File           6C8: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log
          java.exe           pid: 1852   type: File           740: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log
          java.exe           pid: 1852   type: File           744: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log
          java.exe           pid: 1852   type: File           958: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-1-Index.db
          java.exe           pid: 1852   type: File           968: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-1-Data.db
          java.exe           pid: 1852   type: File           96C: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-1-Index.db
          java.exe           pid: 1852   type: File           970: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-1-Data.db
          java.exe           pid: 1852   type: File           974: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-1-Index.db
          java.exe           pid: 1852   type: File           978: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-1-Data.db
          
          C:\Windows\system32>
          C:\Windows\system32>
          C:\Windows\system32>
          C:\Windows\system32>
          C:\Windows\system32>
          C:\Windows\system32>
          C:\Windows\system32>
          C:\Windows\system32>
          C:\Windows\system32>
          C:\Windows\system32>c:\progs\SysinternalsSuite\handle -a c:\progs\apache-cassandra-2.1.2\data
          
          Handle v3.51
          Copyright (C) 1997-2013 Mark Russinovich
          Sysinternals - www.sysinternals.com
          
          java.exe           pid: 1852   type: File           658: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log
          java.exe           pid: 1852   type: File           6C8: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log
          java.exe           pid: 1852   type: File           740: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log
          java.exe           pid: 1852   type: File           744: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log
          java.exe           pid: 1852   type: File           958: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-26-Data.db
          java.exe           pid: 1852   type: File           97C: C:\progs\apache-cassandra-2.1.2\data\data\system\local-7ad54392bcdd35a684174e047860b377\system-local-ka-5-Data.db
          java.exe           pid: 1852   type: File           B74: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-29-Data.db
          java.exe           pid: 1852   type: File           B7C: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-26-Index.db
          java.exe           pid: 1852   type: File           BA0: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-27-Data.db
          java.exe           pid: 1852   type: File           BA4: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-25-Data.db
          java.exe           pid: 1852   type: File           BF8: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-27-Index.db
          
          C:\Windows\system32>c:\progs\SysinternalsSuite\handle -a c:\progs\apache-cassandra-2.1.2\data
          
          Handle v3.51
          Copyright (C) 1997-2013 Mark Russinovich
          Sysinternals - www.sysinternals.com
          
          java.exe           pid: 1852   type: File           658: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log
          java.exe           pid: 1852   type: File           6C8: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log
          java.exe           pid: 1852   type: File           740: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log
          java.exe           pid: 1852   type: File           744: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log
          java.exe           pid: 1852   type: File           968: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-98-Index.db
          java.exe           pid: 1852   type: File           97C: C:\progs\apache-cassandra-2.1.2\data\data\system\local-7ad54392bcdd35a684174e047860b377\system-local-ka-5-Data.db
          java.exe           pid: 1852   type: File           B44: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-102-Index.db
          java.exe           pid: 1852   type: File           B5C: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-99-Index.db
          java.exe           pid: 1852   type: File           B60: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-101-Data.db
          java.exe           pid: 1852   type: File           B70: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-98-Data.db
          java.exe           pid: 1852   type: File           B7C: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-99-Data.db
          java.exe           pid: 1852   type: File           B88: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-97-Data.db
          java.exe           pid: 1852   type: File           BA4: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-97-Data.db
          java.exe           pid: 1852   type: File           BAC: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-102-Data.db
          java.exe           pid: 1852   type: File           BC8: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-99-Index.db
          java.exe           pid: 1852   type: File           BE0: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-98-Index.db
          java.exe           pid: 1852   type: File           BE4: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-99-Data.db
          java.exe           pid: 1852   type: File           BF0: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-98-Data.db
          
          C:\Windows\system32>
          C:\Windows\system32>
          C:\Windows\system32>
          C:\Windows\system32>
          C:\Windows\system32>
          C:\Windows\system32>date /t && c:\progs\SysinternalsSuite\handle -a c:\progs\apache-cassandra-2.1.2\data
          Sun 12/14/2014
          
          Handle v3.51
          Copyright (C) 1997-2013 Mark Russinovich
          Sysinternals - www.sysinternals.com
          
          java.exe           pid: 1852   type: File           658: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log
          java.exe           pid: 1852   type: File           6C8: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log
          java.exe           pid: 1852   type: File           740: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log
          java.exe           pid: 1852   type: File           744: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log
          java.exe           pid: 1852   type: File           B20: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-249-Data.db
          java.exe           pid: 1852   type: File           B24: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-250-Data.db
          java.exe           pid: 1852   type: File           B3C: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-250-Data.db
          java.exe           pid: 1852   type: File           B44: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-251-Index.db
          java.exe           pid: 1852   type: File           B70: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-255-Index.db
          java.exe           pid: 1852   type: File           B80: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-249-Data.db
          java.exe           pid: 1852   type: File           B94: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-256-Index.db
          java.exe           pid: 1852   type: File           BAC: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-250-Index.db
          java.exe           pid: 1852   type: File           BB0: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-256-Data.db
          java.exe           pid: 1852   type: File           BBC: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-255-Data.db
          java.exe           pid: 1852   type: File           BD0: C:\progs\apache-cassandra-2.1.2\data\data\system\local-7ad54392bcdd35a684174e047860b377\system-local-ka-5-Data.db
          java.exe           pid: 1852   type: File           BD8: C:\progs\apache-cassandra-2.1.2\data\data\system\local-7ad54392bcdd35a684174e047860b377\system-local-ka-5-Data.db
          java.exe           pid: 1852   type: File           C48: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-254-Data.db
          java.exe           pid: 1852   type: File           C58: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-250-Index.db
          java.exe           pid: 1852   type: File           C78: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-251-Data.db
          java.exe           pid: 1852   type: File           C7C: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-251-Index.db
          java.exe           pid: 1852   type: File           C80: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-251-Data.db
          
          C:\Windows\system32>
          

          The last run is done after exception appeared.
          Unfortunately I do not know what I have to pay my attention on. Please let me know what kind of additional information can help you. BTW if you want we even can organize online session against one of the machines where the problem happens.

          Show
          alexander_radzin Alexander Radzin added a comment - Joshua McKenzie, thank you for your efforts. Today I managed to reproduce the issue on other developer's laptop, however everything worked well on integration machine. Laptops are running windows 7 and 8, integration machine windows 2002, R2. I played with sys internals and saw that there are 2 processes that act in cassandra data directory: windows defender and SVN. I stopped both and ran test again with process monitor from sysinternals and the problem happened again although the view of process monitor was clear (i.e. only java.exe) was active in this directory. Following your suggestion I ran the test again and executed hanle while test was running. And here are the results: C:\Windows\system32>c:\progs\SysinternalsSuite\handle -a c:\progs\apache-cassandra-2.1.2\data Handle v3.51 Copyright (C) 1997-2013 Mark Russinovich Sysinternals - www.sysinternals.com java.exe pid: 1852 type: File 658: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log java.exe pid: 1852 type: File 6C8: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log java.exe pid: 1852 type: File 740: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log java.exe pid: 1852 type: File 744: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log java.exe pid: 1852 type: File 958: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-1-Index.db java.exe pid: 1852 type: File 968: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-1-Data.db java.exe pid: 1852 type: File 96C: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-1-Index.db java.exe pid: 1852 type: File 970: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-1-Data.db java.exe pid: 1852 type: File 974: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-1-Index.db java.exe pid: 1852 type: File 978: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-1-Data.db C:\Windows\system32> C:\Windows\system32> C:\Windows\system32> C:\Windows\system32> C:\Windows\system32> C:\Windows\system32> C:\Windows\system32> C:\Windows\system32> C:\Windows\system32> C:\Windows\system32>c:\progs\SysinternalsSuite\handle -a c:\progs\apache-cassandra-2.1.2\data Handle v3.51 Copyright (C) 1997-2013 Mark Russinovich Sysinternals - www.sysinternals.com java.exe pid: 1852 type: File 658: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log java.exe pid: 1852 type: File 6C8: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log java.exe pid: 1852 type: File 740: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log java.exe pid: 1852 type: File 744: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log java.exe pid: 1852 type: File 958: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-26-Data.db java.exe pid: 1852 type: File 97C: C:\progs\apache-cassandra-2.1.2\data\data\system\local-7ad54392bcdd35a684174e047860b377\system-local-ka-5-Data.db java.exe pid: 1852 type: File B74: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-29-Data.db java.exe pid: 1852 type: File B7C: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-26-Index.db java.exe pid: 1852 type: File BA0: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-27-Data.db java.exe pid: 1852 type: File BA4: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-25-Data.db java.exe pid: 1852 type: File BF8: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-27-Index.db C:\Windows\system32>c:\progs\SysinternalsSuite\handle -a c:\progs\apache-cassandra-2.1.2\data Handle v3.51 Copyright (C) 1997-2013 Mark Russinovich Sysinternals - www.sysinternals.com java.exe pid: 1852 type: File 658: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log java.exe pid: 1852 type: File 6C8: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log java.exe pid: 1852 type: File 740: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log java.exe pid: 1852 type: File 744: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log java.exe pid: 1852 type: File 968: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-98-Index.db java.exe pid: 1852 type: File 97C: C:\progs\apache-cassandra-2.1.2\data\data\system\local-7ad54392bcdd35a684174e047860b377\system-local-ka-5-Data.db java.exe pid: 1852 type: File B44: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-102-Index.db java.exe pid: 1852 type: File B5C: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-99-Index.db java.exe pid: 1852 type: File B60: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-101-Data.db java.exe pid: 1852 type: File B70: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-98-Data.db java.exe pid: 1852 type: File B7C: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-99-Data.db java.exe pid: 1852 type: File B88: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-97-Data.db java.exe pid: 1852 type: File BA4: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-97-Data.db java.exe pid: 1852 type: File BAC: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-102-Data.db java.exe pid: 1852 type: File BC8: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-99-Index.db java.exe pid: 1852 type: File BE0: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-98-Index.db java.exe pid: 1852 type: File BE4: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-99-Data.db java.exe pid: 1852 type: File BF0: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-98-Data.db C:\Windows\system32> C:\Windows\system32> C:\Windows\system32> C:\Windows\system32> C:\Windows\system32> C:\Windows\system32>date /t && c:\progs\SysinternalsSuite\handle -a c:\progs\apache-cassandra-2.1.2\data Sun 12/14/2014 Handle v3.51 Copyright (C) 1997-2013 Mark Russinovich Sysinternals - www.sysinternals.com java.exe pid: 1852 type: File 658: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log java.exe pid: 1852 type: File 6C8: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506027.log java.exe pid: 1852 type: File 740: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log java.exe pid: 1852 type: File 744: C:\progs\apache-cassandra-2.1.2\data\commitlog\CommitLog-4-1418558506028.log java.exe pid: 1852 type: File B20: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-249-Data.db java.exe pid: 1852 type: File B24: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-250-Data.db java.exe pid: 1852 type: File B3C: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-250-Data.db java.exe pid: 1852 type: File B44: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-251-Index.db java.exe pid: 1852 type: File B70: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-255-Index.db java.exe pid: 1852 type: File B80: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-249-Data.db java.exe pid: 1852 type: File B94: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-256-Index.db java.exe pid: 1852 type: File BAC: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-250-Index.db java.exe pid: 1852 type: File BB0: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-256-Data.db java.exe pid: 1852 type: File BBC: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-255-Data.db java.exe pid: 1852 type: File BD0: C:\progs\apache-cassandra-2.1.2\data\data\system\local-7ad54392bcdd35a684174e047860b377\system-local-ka-5-Data.db java.exe pid: 1852 type: File BD8: C:\progs\apache-cassandra-2.1.2\data\data\system\local-7ad54392bcdd35a684174e047860b377\system-local-ka-5-Data.db java.exe pid: 1852 type: File C48: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-254-Data.db java.exe pid: 1852 type: File C58: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-250-Index.db java.exe pid: 1852 type: File C78: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-251-Data.db java.exe pid: 1852 type: File C7C: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-251-Index.db java.exe pid: 1852 type: File C80: C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-251-Data.db C:\Windows\system32> The last run is done after exception appeared. Unfortunately I do not know what I have to pay my attention on. Please let me know what kind of additional information can help you. BTW if you want we even can organize online session against one of the machines where the problem happens.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          That's unfortunate - I was hoping it was a longer-running contention handle.exe would catch but it looks like either a) it's quick and gone by the time we captured that or b) the access violation is encapsulated within the java process itself.

          You mentioned Process Monitor - if you would be willing to reproduce the problem while running a ProcMon trace with a filter for "Column: Path, Relation: contains, Value: $env:CASSANDRA_HOME\data", i.e. only include access to files in your data directory, flag Filter->Enable Advanced Output, and also get a clean system.log file from that failed run that might give us a more definitive answer as to whether it's internal or external to the java process.

          Along with the above, if you're comfortable running msinfo32.exe and saving the output it should give us the running software and loaded kernel modules that correlate with messages from the ProcMon trace. If you'd prefer to take that info offline you can email the above data to me directly as well.

          The fact that you're able to reproduce this on win7 and win8 is quite interesting; I use win7 as my dev environment (as do a few of the other committers) and to my knowledge we haven't seen this yet. Are both / either of those machines company-issued with some form of security software on them that could be playing into this?

          Thanks for the continued assistance!

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - That's unfortunate - I was hoping it was a longer-running contention handle.exe would catch but it looks like either a) it's quick and gone by the time we captured that or b) the access violation is encapsulated within the java process itself. You mentioned Process Monitor - if you would be willing to reproduce the problem while running a ProcMon trace with a filter for "Column: Path, Relation: contains, Value: $env:CASSANDRA_HOME\data", i.e. only include access to files in your data directory, flag Filter->Enable Advanced Output, and also get a clean system.log file from that failed run that might give us a more definitive answer as to whether it's internal or external to the java process. Along with the above, if you're comfortable running msinfo32.exe and saving the output it should give us the running software and loaded kernel modules that correlate with messages from the ProcMon trace. If you'd prefer to take that info offline you can email the above data to me directly as well. The fact that you're able to reproduce this on win7 and win8 is quite interesting; I use win7 as my dev environment (as do a few of the other committers) and to my knowledge we haven't seen this yet. Are both / either of those machines company-issued with some form of security software on them that could be playing into this? Thanks for the continued assistance!
          Hide
          alexander_radzin Alexander Radzin added a comment - - edited

          Attached Log files (NoHostAvailable.zip) from run that reproduces NoHostAvailable problem. The cqlSync successfully finished 11 iterations and failed while doing the iterration #12.

          This is the client side exception:

          com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.TransportException: [localhost/127.0.0.1:9042] Connection has been closed))
          	at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
          	at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:259)
          	at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:175)
          	at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
          	at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36)
          	at com.clarisite.clingine.dataaccesslayer.cassandra.CQLTest.cqlSync(CQLTest.java:32)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
          	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
          	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
          	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
          	at com.intellij.junit4.JUnit4TestRunnerUtil$IgnoreIgnoredTestJUnit4ClassRunner.runChild(JUnit4TestRunnerUtil.java:269)
          	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
          	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
          	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
          	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
          	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
          	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
          	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
          	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
          	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
          	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:202)
          	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:65)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121)
          Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.TransportException: [localhost/127.0.0.1:9042] Connection has been closed))
          	at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:102)
          	at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:176)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          	at java.lang.Thread.run(Thread.java:744)
          

          Here is the server side exception:

          INFO  16:28:12 Compacted 4 sstables to [c:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-881,].  297,142 bytes to 295,369 (~99% of original) in 375ms = 0.751162MB/s.  16 total partitions merged to 13.  Partition merge counts were {1:12, 4:1, }
          ERROR 16:28:12 Unable to delete c:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-879-Data.db (it will be removed on server restart; we'll also retry after GC)
          ERROR 16:28:12 Exception in thread Thread[NonPeriodicTasks:1,5,main]
          org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: c:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-877-Index.db: The process cannot access the file because it is being used by another process.
          
                  at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) ~[apache-cassandra-2.1.2.jar:2.1.2]
                  at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) ~[apache-cassandra-2.1.2.jar:2.1.2]
                  at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) ~[apache-cassandra-2.1.2.jar:2.1.2]
                  at org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94) ~[apache-cassandra-2.1.2.jar:2.1.2]
                  at org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:665) ~[apache-cassandra-2.1.2.jar:2.1.2]
                  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_45]
                  at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_45]
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) ~[na:1.7.0_45]
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) ~[na:1.7.0_45]
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_45]
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
                  at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
          Caused by: java.nio.file.FileSystemException: c:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-877-Index.db: The process cannot access the file because it is being used by another process.
          
                  at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[na:1.7.0_45]
                  at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[na:1.7.0_45]
                  at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) ~[na:1.7.0_45]
                  at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) ~[na:1.7.0_45]
                  at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) ~[na:1.7.0_45]
                  at java.nio.file.Files.delete(Files.java:1077) ~[na:1.7.0_45]
                  at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) ~[apache-cassandra-2.1.2.jar:2.1.2]
                  ... 11 common frames omitted
          ERROR 16:28:12 Stopping gossiper
          WARN  16:28:12 Stopping gossip by operator request
          INFO  16:28:12 Announcing shutdown
          INFO  16:28:12 Compacted 4 sstables to [c:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-881,].  302,356 bytes to 301,730 (~99% of original) in 396ms = 0.726647MB/s.  16 total partitions merged to 13.  Partition merge counts were {1:12, 4:1, }
          Show
          alexander_radzin Alexander Radzin added a comment - - edited Attached Log files (NoHostAvailable.zip) from run that reproduces NoHostAvailable problem. The cqlSync successfully finished 11 iterations and failed while doing the iterration #12. This is the client side exception: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.TransportException: [localhost/127.0.0.1:9042] Connection has been closed)) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:259) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:175) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36) at com.clarisite.clingine.dataaccesslayer.cassandra.CQLTest.cqlSync(CQLTest.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at com.intellij.junit4.JUnit4TestRunnerUtil$IgnoreIgnoredTestJUnit4ClassRunner.runChild(JUnit4TestRunnerUtil.java:269) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:202) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:65) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.TransportException: [localhost/127.0.0.1:9042] Connection has been closed)) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:102) at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Here is the server side exception: INFO 16:28:12 Compacted 4 sstables to [c:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-881,]. 297,142 bytes to 295,369 (~99% of original) in 375ms = 0.751162MB/s. 16 total partitions merged to 13. Partition merge counts were {1:12, 4:1, } ERROR 16:28:12 Unable to delete c:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-879-Data.db (it will be removed on server restart; we'll also retry after GC) ERROR 16:28:12 Exception in thread Thread[NonPeriodicTasks:1,5,main] org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: c:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-877-Index.db: The process cannot access the file because it is being used by another process. at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:665) ~[apache-cassandra-2.1.2.jar:2.1.2] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_45] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_45] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) ~[na:1.7.0_45] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) ~[na:1.7.0_45] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] Caused by: java.nio.file.FileSystemException: c:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-877-Index.db: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[na:1.7.0_45] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[na:1.7.0_45] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) ~[na:1.7.0_45] at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) ~[na:1.7.0_45] at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) ~[na:1.7.0_45] at java.nio.file.Files.delete(Files.java:1077) ~[na:1.7.0_45] at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) ~[apache-cassandra-2.1.2.jar:2.1.2] ... 11 common frames omitted ERROR 16:28:12 Stopping gossiper WARN 16:28:12 Stopping gossip by operator request INFO 16:28:12 Announcing shutdown INFO 16:28:12 Compacted 4 sstables to [c:\progs\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-881,]. 302,356 bytes to 301,730 (~99% of original) in 396ms = 0.726647MB/s. 16 total partitions merged to 13. Partition merge counts were {1:12, 4:1, }
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          It looks like the windows defender mini filter driver is still active in the kernel from the attached msinfo output: WinDefend / Windows Defender Service still live, wdfilter driver (Windows Defender Mini-Filter Driver) still loaded in kernel.

          I don't see any of the expected errors from your procmon trace for the first file mentioned in the server log above - the messages for that file around that time stamp show:

          "Time of Day","Process Name","PID","Operation","Path","Result","Detail"
          "10:28:12.8039785 AM","System","4","IRP_MJ_SET_INFORMATION","C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-879-Data.db","SUCCESS","Type: SetEndOfFileInformationFile, EndOfFile: 1,012"
          "10:28:12.8039968 AM","System","4","FASTIO_ACQUIRE_FOR_SECTION_SYNCHRONIZATION","C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-879-Data.db","SUCCESS","SyncType: SyncTypeOther"
          "10:28:12.8040110 AM","System","4","FASTIO_RELEASE_FOR_SECTION_SYNCHRONIZATION","C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-879-Data.db","SUCCESS",""
          

          When I run a local synthetic file deletion sharing error (what you're seeing), my procmon output shows the following:

          "Time of Day","Process Name","PID","Operation","Path","Result","Detail"
          "1:46:45.8623305 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Read Attributes, Delete, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a"
          "1:46:45.8654532 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Read Attributes, Delete, Read Control, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a"
          "1:46:45.8655184 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Read Attributes, Delete, Synchronize, Disposition: Open, Options: Sequential Access, Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a"
          "1:46:45.8658356 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Read Attributes, Delete, Read Control, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a"
          "1:46:45.8658914 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Read Attributes, Delete, Synchronize, Disposition: Open, Options: Sequential Access, Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a"
          "1:46:45.9305935 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Generic Read/Write, Disposition: Open, Options: No Buffering, Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: n/a"
          "1:46:45.9524313 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Generic Read/Write, Disposition: Open, Options: No Buffering, Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: n/a"
          

          Since your procmon trace has nothing other than RESULT: SUCCESS in it, unfortunately it doesn't give us any more information as to the problem at hand.

          The only other suggestion I have is to completely stop the Windows Defender service and see if you can still reproduce. Beyond that I'm out of ideas - Philip Thompson could you try reproducing this on a win7 machine?

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - It looks like the windows defender mini filter driver is still active in the kernel from the attached msinfo output: WinDefend / Windows Defender Service still live, wdfilter driver (Windows Defender Mini-Filter Driver) still loaded in kernel. I don't see any of the expected errors from your procmon trace for the first file mentioned in the server log above - the messages for that file around that time stamp show: "Time of Day","Process Name","PID","Operation","Path","Result","Detail" "10:28:12.8039785 AM","System","4","IRP_MJ_SET_INFORMATION","C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-879-Data.db","SUCCESS","Type: SetEndOfFileInformationFile, EndOfFile: 1,012" "10:28:12.8039968 AM","System","4","FASTIO_ACQUIRE_FOR_SECTION_SYNCHRONIZATION","C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-879-Data.db","SUCCESS","SyncType: SyncTypeOther" "10:28:12.8040110 AM","System","4","FASTIO_RELEASE_FOR_SECTION_SYNCHRONIZATION","C:\progs\apache-cassandra-2.1.2\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-879-Data.db","SUCCESS","" When I run a local synthetic file deletion sharing error (what you're seeing), my procmon output shows the following: "Time of Day","Process Name","PID","Operation","Path","Result","Detail" "1:46:45.8623305 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Read Attributes, Delete, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a" "1:46:45.8654532 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Read Attributes, Delete, Read Control, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a" "1:46:45.8655184 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Read Attributes, Delete, Synchronize, Disposition: Open, Options: Sequential Access, Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a" "1:46:45.8658356 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Read Attributes, Delete, Read Control, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a" "1:46:45.8658914 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Read Attributes, Delete, Synchronize, Disposition: Open, Options: Sequential Access, Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a" "1:46:45.9305935 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Generic Read/Write, Disposition: Open, Options: No Buffering, Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: n/a" "1:46:45.9524313 PM","Explorer.EXE","4804","IRP_MJ_CREATE","C:\tmp\testfile.txt","SHARING VIOLATION","Desired Access: Generic Read/Write, Disposition: Open, Options: No Buffering, Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: n/a" Since your procmon trace has nothing other than RESULT: SUCCESS in it, unfortunately it doesn't give us any more information as to the problem at hand. The only other suggestion I have is to completely stop the Windows Defender service and see if you can still reproduce. Beyond that I'm out of ideas - Philip Thompson could you try reproducing this on a win7 machine?
          Hide
          philipthompson Philip Thompson added a comment -

          No luck reproducing. I'm able to run the tests for a few hours with no problems, I increased the number of iterations in the loop and tried changing the year used. Based on the hypothesis it was an AV issue, I even ran a few active scans during the tests and that didn't affect anything.

          Show
          philipthompson Philip Thompson added a comment - No luck reproducing. I'm able to run the tests for a few hours with no problems, I increased the number of iterations in the loop and tried changing the year used. Based on the hypothesis it was an AV issue, I even ran a few active scans during the tests and that didn't affect anything.
          Hide
          yarongu Yaron Gueta added a comment -

          Well, for me, it seems to be reproducible constantly. After investigating it for a while, it seems like the main issue is not the file lock but the result of it, which is, stopping Cassandra listeners and making it unreachable.

          I will begin with the file lock issue,
          The file lock seems to be as a result of compaction that didn't release the handle yet, and it will be probably released after a while. This is not something which is new for Cassandra, after all, we can see a lot of messages like

          ERROR [NonPeriodicTasks:1] 2015-01-02 11:53:13,291 SSTableDeletingTask.java:89 - Unable to delete c:\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-112-Data.db (it will be removed on server restart; we'll also retry after GC)
          

          These file will be removed eventually, no harm done.

          This specific behavior is part of SSTableDeletingTask.java

          SSTableDeletingTask.java
           public void run()
              {
                  long size = referent.bytesOnDisk();
          
                  if (tracker != null)
                      tracker.notifyDeleting(referent);
          
                  if (referent.readMeter != null)
                      SystemKeyspace.clearSSTableReadMeter(referent.getKeyspaceName(), referent.getColumnFamilyName(), referent.descriptor.generation);
          
                  // If we can't successfully delete the DATA component, set the task to be retried later: see above
                  File datafile = new File(desc.filenameFor(Component.DATA));
                  if (!datafile.delete())
                  {
                      logger.error("Unable to delete {} (it will be removed on server restart; we'll also retry after GC)", datafile);
                      failedTasks.add(this);
                      return;
                  }
                  // let the remainder be cleaned up by delete
                  SSTable.delete(desc, Sets.difference(components, Collections.singleton(Component.DATA)));
                  if (tracker != null)
                      tracker.spaceReclaimed(size);
              }
          

          The line:

           if (!datafile.delete())
          

          does not throw an exception and the problem is being taken care of later on.

          The problem is with the following section

          SSTable.delete(desc, Sets.difference(components, Collections.singleton(Component.DATA)));
          

          This line leads to

          FileUtils.java
          public static void deleteWithConfirm(File file)
              {
                  assert file.exists() : "attempted to delete non-existing file " + file.getName();
                  if (logger.isDebugEnabled())
                      logger.debug("Deleting {}", file.getName());
                  try
                  {
                      Files.delete(file.toPath());
                  }
                  catch (IOException e)
                  {
                      throw new FSWriteError(e, file);
                  }
              }
          

          which throws an exception which no one catches and it goes to the UncaughtExceptionHandler.

          The UncaughtExceptionHandler lead to

          FileUtils.java
          public static void handleFSError(FSError e)
              {
                  JVMStabilityInspector.inspectThrowable(e);
                  switch (DatabaseDescriptor.getDiskFailurePolicy())
                  {
                      case stop_paranoid:
                      case stop:
                          StorageService.instance.stopTransports();
                          break;
                      case best_effort:
                          // for both read and write errors mark the path as unwritable.
                          BlacklistedDirectories.maybeMarkUnwritable(e.path);
                          if (e instanceof FSReadError)
                          {
                              File directory = BlacklistedDirectories.maybeMarkUnreadable(e.path);
                              if (directory != null)
                                  Keyspace.removeUnreadableSSTables(directory);
                          }
                          break;
                      case ignore:
                          // already logged, so left nothing to do
                          break;
                      default:
                          throw new IllegalStateException();
                  }
              }
          

          and the StorageService.instance.stopTransports(); is used. This, of course, makes Cassandra unreachable and the client to throw NoHostAvailableException.

          The question is, file locks and deleting difficulties can happen, the decision to stop Cassandra listeners and make it unreachable is too risky to my opinion. Can it be handled that same way the other deleting tasks failures?

          Thank you for the support.

          Show
          yarongu Yaron Gueta added a comment - Well, for me, it seems to be reproducible constantly. After investigating it for a while, it seems like the main issue is not the file lock but the result of it, which is, stopping Cassandra listeners and making it unreachable. I will begin with the file lock issue, The file lock seems to be as a result of compaction that didn't release the handle yet, and it will be probably released after a while. This is not something which is new for Cassandra, after all, we can see a lot of messages like ERROR [NonPeriodicTasks:1] 2015-01-02 11:53:13,291 SSTableDeletingTask.java:89 - Unable to delete c:\apache-cassandra-2.1.2\data\data\system\schema_columns-296e9c049bec3085827dc17d3df2122a\system-schema_columns-ka-112-Data.db (it will be removed on server restart; we'll also retry after GC) These file will be removed eventually, no harm done. This specific behavior is part of SSTableDeletingTask.java SSTableDeletingTask.java public void run() { long size = referent.bytesOnDisk(); if (tracker != null ) tracker.notifyDeleting(referent); if (referent.readMeter != null ) SystemKeyspace.clearSSTableReadMeter(referent.getKeyspaceName(), referent.getColumnFamilyName(), referent.descriptor.generation); // If we can't successfully delete the DATA component, set the task to be retried later: see above File datafile = new File(desc.filenameFor(Component.DATA)); if (!datafile.delete()) { logger.error( "Unable to delete {} (it will be removed on server restart; we'll also retry after GC)" , datafile); failedTasks.add( this ); return ; } // let the remainder be cleaned up by delete SSTable.delete(desc, Sets.difference(components, Collections.singleton(Component.DATA))); if (tracker != null ) tracker.spaceReclaimed(size); } The line: if (!datafile.delete()) does not throw an exception and the problem is being taken care of later on. The problem is with the following section SSTable.delete(desc, Sets.difference(components, Collections.singleton(Component.DATA))); This line leads to FileUtils.java public static void deleteWithConfirm(File file) { assert file.exists() : "attempted to delete non-existing file " + file.getName(); if (logger.isDebugEnabled()) logger.debug( "Deleting {}" , file.getName()); try { Files.delete(file.toPath()); } catch (IOException e) { throw new FSWriteError(e, file); } } which throws an exception which no one catches and it goes to the UncaughtExceptionHandler. The UncaughtExceptionHandler lead to FileUtils.java public static void handleFSError(FSError e) { JVMStabilityInspector.inspectThrowable(e); switch (DatabaseDescriptor.getDiskFailurePolicy()) { case stop_paranoid: case stop: StorageService.instance.stopTransports(); break ; case best_effort: // for both read and write errors mark the path as unwritable. BlacklistedDirectories.maybeMarkUnwritable(e.path); if (e instanceof FSReadError) { File directory = BlacklistedDirectories.maybeMarkUnreadable(e.path); if (directory != null ) Keyspace.removeUnreadableSSTables(directory); } break ; case ignore: // already logged, so left nothing to do break ; default : throw new IllegalStateException(); } } and the StorageService.instance.stopTransports(); is used. This, of course, makes Cassandra unreachable and the client to throw NoHostAvailableException. The question is, file locks and deleting difficulties can happen, the decision to stop Cassandra listeners and make it unreachable is too risky to my opinion. Can it be handled that same way the other deleting tasks failures? Thank you for the support.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          In SSTableDeletingTask, we function under the assumption that once we're successfully able to delete the data file we should be able to delete the other components of an SSTable without issue as all internal handles will be closed at this time and mapped buffers unmapped. The problem comes in when we have external applications (A/V, Windows Search, anything that provides a file system driver in the kernel) that hold handles to a file that we're trying to delete as our deletion could fail. Given the import of failed file deletion (filling up a drive and killing a node), by default the disk_failure_policy in conf/cassandra.yaml is to stop.

          File locks and deleting difficulties on a Cassandra cluster should not happen. This is why we've pursued CASSANDRA-4050 for the 3.0 release and have temporarily disabled memory-mapped I/O until we have a more robust solution to work around file deletion issues on Windows.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - In SSTableDeletingTask, we function under the assumption that once we're successfully able to delete the data file we should be able to delete the other components of an SSTable without issue as all internal handles will be closed at this time and mapped buffers unmapped. The problem comes in when we have external applications (A/V, Windows Search, anything that provides a file system driver in the kernel) that hold handles to a file that we're trying to delete as our deletion could fail. Given the import of failed file deletion (filling up a drive and killing a node), by default the disk_failure_policy in conf/cassandra.yaml is to stop. File locks and deleting difficulties on a Cassandra cluster should not happen. This is why we've pursued CASSANDRA-4050 for the 3.0 release and have temporarily disabled memory-mapped I/O until we have a more robust solution to work around file deletion issues on Windows.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Side note - for a list of file system drivers running on your machine, start->run->msinfo32, Software Environment->System Drivers, sort by "Type" and then "State" (control+click it). Anything listed as Started=Yes and State=Running is active.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Side note - for a list of file system drivers running on your machine, start->run->msinfo32, Software Environment->System Drivers, sort by "Type" and then "State" (control+click it). Anything listed as Started=Yes and State=Running is active.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Closing out as cannot reproduce for now. Alexander Radzin / Ilya Komolkin: if either of you can get us more / new information about this we can revisit it, but thus far it looks like it's specific to your environment and possibly the file system drivers active on your machines.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Closing out as cannot reproduce for now. Alexander Radzin / Ilya Komolkin : if either of you can get us more / new information about this we can revisit it, but thus far it looks like it's specific to your environment and possibly the file system drivers active on your machines.
          Hide
          choweiyuan Wei Yuan Cho added a comment - - edited

          Hi there,

          I have a similar error as this thread and I'm running Windows Server 2008.
          I have set to DiskAccessMode to standard and I'm still getting the following error

          INFO [main] 2015-01-23 16:41:01,578 DatabaseDescriptor.java:211 - DiskAccessMode is standard, indexAccessMode is standard

          ERROR [NonPeriodicTasks:1] 2015-01-30 22:02:02,284 SSTableDeletingTask.java:89 - Unable to delete \var\lib\cassandra\data\CM\Alerts\CM-Alerts-ka-137342-Data.db (it will be removed on server restart; we'll also retry after GC)
          ERROR [NonPeriodicTasks:1] 2015-01-30 22:02:02,284 SSTableDeletingTask.java:89 - Unable to delete \var\lib\cassandra\data\CM\Alerts\CM-Alerts-ka-137341-Data.db (it will be removed on server restart; we'll also retry after GC)
          INFO [CompactionExecutor:3644] 2015-01-30 22:02:02,284 CompactionTask.java:252 - Compacted 4 sstables to [\var\lib\cassandra\data\CM\Alerts\CM-Alerts-ka-137344,]. 4,830,501 bytes to 2,453,934 (~50% of original) in 1,433ms = 1.633115MB/s. 13 total partitions merged to 4. Partition merge counts were

          Unknown macro: {1}

          INFO [CompactionExecutor:3645] 2015-01-30 22:02:02,565 CompactionTask.java:252 - Compacted 4 sstables to [\var\lib\cassandra\data\system\compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b\system-compactions_in_progress-ka-5659,]. 424 bytes to 42 (~9% of original) in 285ms = 0.000141MB/s. 4 total partitions merged to 1. Partition merge counts were

          Unknown macro: {2}

          ERROR [NonPeriodicTasks:1] 2015-01-30 22:02:02,659 CassandraDaemon.java:153 - Exception in thread Thread[NonPeriodicTasks:1,5,main]
          org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: \var\lib\cassandra\data\CM\Alerts\CM-Alerts-ka-137339-Index.db: The process cannot access the file because it is being used by another process.

          at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) ~[apache-cassandra-2.1.1.jar:2.1.1]
          at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) ~[apache-cassandra-2.1.1.jar:2.1.1]
          at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) ~[apache-cassandra-2.1.1.jar:2.1.1]
          at org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94) ~[apache-cassandra-2.1.1.jar:2.1.1]
          at org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) ~[apache-cassandra-2.1.1.jar:2.1.1]
          at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.7.0_71]
          at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_71]
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) ~[na:1.7.0_71]
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[na:1.7.0_71]
          at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_71]
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.7.0_71]
          at java.lang.Thread.run(Unknown Source) [na:1.7.0_71]
          Caused by: java.nio.file.FileSystemException: \var\lib\cassandra\data\CM\Alerts\CM-Alerts-ka-137339-Index.db: The process cannot access the file because it is being used by another process.

          at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) ~[na:1.7.0_71]
          at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) ~[na:1.7.0_71]
          at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) ~[na:1.7.0_71]
          at sun.nio.fs.WindowsFileSystemProvider.implDelete(Unknown Source) ~[na:1.7.0_71]
          at sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source) ~[na:1.7.0_71]
          at java.nio.file.Files.delete(Unknown Source) ~[na:1.7.0_71]
          at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) ~[apache-cassandra-2.1.1.jar:2.1.1]
          ... 11 common frames omitted
          ERROR [NonPeriodicTasks:1] 2015-01-30 22:02:02,925 StorageService.java:377 - Stopping gossiper
          WARN [NonPeriodicTasks:1] 2015-01-30 22:02:02,925 StorageService.java:291 - Stopping gossip by operator request
          INFO [NonPeriodicTasks:1] 2015-01-30 22:02:02,925 Gossiper.java:1317 - Announcing shutdown
          ERROR [NonPeriodicTasks:1] 2015-01-30 22:02:04,925 StorageService.java:382 - Stopping RPC server
          INFO [NonPeriodicTasks:1] 2015-01-30 22:02:04,925 ThriftServer.java:142 - Stop listening to thrift clients
          ERROR [NonPeriodicTasks:1] 2015-01-30 22:02:04,956 StorageService.java:387 - Stopping native transport
          INFO [NonPeriodicTasks:1] 2015-01-30 22:02:05,519 Server.java:213 - Stop listening for CQL clients

          I've read that 3.0 will fix the root cause of this but is there any workaround I could do about this? I'll attach my log files

          Attached
          https://issues.apache.org/jira/secure/attachment/12695903/CassandraDiedWithDiskAccessModeStandardLogs.7z

          Show
          choweiyuan Wei Yuan Cho added a comment - - edited Hi there, I have a similar error as this thread and I'm running Windows Server 2008. I have set to DiskAccessMode to standard and I'm still getting the following error INFO [main] 2015-01-23 16:41:01,578 DatabaseDescriptor.java:211 - DiskAccessMode is standard, indexAccessMode is standard ERROR [NonPeriodicTasks:1] 2015-01-30 22:02:02,284 SSTableDeletingTask.java:89 - Unable to delete \var\lib\cassandra\data\CM\Alerts\CM-Alerts-ka-137342-Data.db (it will be removed on server restart; we'll also retry after GC) ERROR [NonPeriodicTasks:1] 2015-01-30 22:02:02,284 SSTableDeletingTask.java:89 - Unable to delete \var\lib\cassandra\data\CM\Alerts\CM-Alerts-ka-137341-Data.db (it will be removed on server restart; we'll also retry after GC) INFO [CompactionExecutor:3644] 2015-01-30 22:02:02,284 CompactionTask.java:252 - Compacted 4 sstables to [\var\lib\cassandra\data\CM\Alerts\CM-Alerts-ka-137344,] . 4,830,501 bytes to 2,453,934 (~50% of original) in 1,433ms = 1.633115MB/s. 13 total partitions merged to 4. Partition merge counts were Unknown macro: {1} INFO [CompactionExecutor:3645] 2015-01-30 22:02:02,565 CompactionTask.java:252 - Compacted 4 sstables to [\var\lib\cassandra\data\system\compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b\system-compactions_in_progress-ka-5659,] . 424 bytes to 42 (~9% of original) in 285ms = 0.000141MB/s. 4 total partitions merged to 1. Partition merge counts were Unknown macro: {2} ERROR [NonPeriodicTasks:1] 2015-01-30 22:02:02,659 CassandraDaemon.java:153 - Exception in thread Thread [NonPeriodicTasks:1,5,main] org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: \var\lib\cassandra\data\CM\Alerts\CM-Alerts-ka-137339-Index.db: The process cannot access the file because it is being used by another process. at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) ~ [apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) ~ [apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) ~ [apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94) ~ [apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) ~ [apache-cassandra-2.1.1.jar:2.1.1] at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~ [na:1.7.0_71] at java.util.concurrent.FutureTask.run(Unknown Source) ~ [na:1.7.0_71] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) ~ [na:1.7.0_71] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~ [na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.7.0_71] at java.lang.Thread.run(Unknown Source) [na:1.7.0_71] Caused by: java.nio.file.FileSystemException: \var\lib\cassandra\data\CM\Alerts\CM-Alerts-ka-137339-Index.db: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) ~ [na:1.7.0_71] at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) ~ [na:1.7.0_71] at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) ~ [na:1.7.0_71] at sun.nio.fs.WindowsFileSystemProvider.implDelete(Unknown Source) ~ [na:1.7.0_71] at sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source) ~ [na:1.7.0_71] at java.nio.file.Files.delete(Unknown Source) ~ [na:1.7.0_71] at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) ~ [apache-cassandra-2.1.1.jar:2.1.1] ... 11 common frames omitted ERROR [NonPeriodicTasks:1] 2015-01-30 22:02:02,925 StorageService.java:377 - Stopping gossiper WARN [NonPeriodicTasks:1] 2015-01-30 22:02:02,925 StorageService.java:291 - Stopping gossip by operator request INFO [NonPeriodicTasks:1] 2015-01-30 22:02:02,925 Gossiper.java:1317 - Announcing shutdown ERROR [NonPeriodicTasks:1] 2015-01-30 22:02:04,925 StorageService.java:382 - Stopping RPC server INFO [NonPeriodicTasks:1] 2015-01-30 22:02:04,925 ThriftServer.java:142 - Stop listening to thrift clients ERROR [NonPeriodicTasks:1] 2015-01-30 22:02:04,956 StorageService.java:387 - Stopping native transport INFO [NonPeriodicTasks:1] 2015-01-30 22:02:05,519 Server.java:213 - Stop listening for CQL clients I've read that 3.0 will fix the root cause of this but is there any workaround I could do about this? I'll attach my log files Attached https://issues.apache.org/jira/secure/attachment/12695903/CassandraDiedWithDiskAccessModeStandardLogs.7z
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          I assume you didn't install the desktop experience and have Windows Defender on there? Alternatively, do you have another file system driver resident from an antivirus software (or other NIPS / IPS software) that could be causing this? Similarly, you guys didn't dig down into the roles and enable Windows Search I assume?

          The "Unable to delete" messages you're seeing are business as usual and those will clear up, however the failure on deleteWithConfirm (as in the other cases) indicates your data file deleted without issue but something had a handle to that index file when it went to delete it. We have yet to have a confirmed instance of this occurring without some file system driver in the mix, so I'd look to that first.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - I assume you didn't install the desktop experience and have Windows Defender on there? Alternatively, do you have another file system driver resident from an antivirus software (or other NIPS / IPS software) that could be causing this? Similarly, you guys didn't dig down into the roles and enable Windows Search I assume? The "Unable to delete" messages you're seeing are business as usual and those will clear up, however the failure on deleteWithConfirm (as in the other cases) indicates your data file deleted without issue but something had a handle to that index file when it went to delete it. We have yet to have a confirmed instance of this occurring without some file system driver in the mix, so I'd look to that first.
          Hide
          choweiyuan Wei Yuan Cho added a comment -

          Hi Joshua, thank you for your reply.

          We have Windows Defender here, I'll exclude Cassandra folder, will that be good enough?
          We do not have Windows Search nor antivirus here. I have checked file system driver, nothing out of the ordinary as far as I can see. I do not mind posting a screenshot of it if required.

          Is it technically possible to retrieve what program has a handle on the index file so that it can be printed to the logs should that error occur? That'd be helpful for debugging issues like this I suppose

          Show
          choweiyuan Wei Yuan Cho added a comment - Hi Joshua, thank you for your reply. We have Windows Defender here, I'll exclude Cassandra folder, will that be good enough? We do not have Windows Search nor antivirus here. I have checked file system driver, nothing out of the ordinary as far as I can see. I do not mind posting a screenshot of it if required. Is it technically possible to retrieve what program has a handle on the index file so that it can be printed to the logs should that error occur? That'd be helpful for debugging issues like this I suppose
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          The best current method I have is to use msinfo32 to check for file system drivers and to deduce from there. I'll see if there's some way to query from user-space - handle from sysinternals lets you run a standalone query for what has file handles open but running that manually isn't realistic as you'll miss transient handles and most of sysinternals stuff is implemented as kernel drivers, so if we find some route to get this same information we'll likely have to take a different route.

          Excluding the Cassandra folder from Windows Defender is highly recommended but I can't speak to the rest of your environment w/regards to file system drivers.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - The best current method I have is to use msinfo32 to check for file system drivers and to deduce from there. I'll see if there's some way to query from user-space - handle from sysinternals lets you run a standalone query for what has file handles open but running that manually isn't realistic as you'll miss transient handles and most of sysinternals stuff is implemented as kernel drivers, so if we find some route to get this same information we'll likely have to take a different route. Excluding the Cassandra folder from Windows Defender is highly recommended but I can't speak to the rest of your environment w/regards to file system drivers.
          Hide
          choweiyuan Wei Yuan Cho added a comment -

          Attached my file systems driver. Anything that might affect Cassandra there?

          Show
          choweiyuan Wei Yuan Cho added a comment - Attached my file systems driver. Anything that might affect Cassandra there?
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          This piqued my interest - it seems we could write a stand-alone .dll and distribute it with Cassandra, making use of the Restart Manager to grab a list of who has file handles open. Even then, I'm not sure that would address our case where file system drivers are causing transient failures as I can't guarantee fs hooks show up in the queried RmGetList results in the first place and and we'd likely race with that query anyway - but it's worth a shot.

          I'll put this on my list of things to look into, likely around the 3.1 era.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - This piqued my interest - it seems we could write a stand-alone .dll and distribute it with Cassandra, making use of the Restart Manager to grab a list of who has file handles open . Even then, I'm not sure that would address our case where file system drivers are causing transient failures as I can't guarantee fs hooks show up in the queried RmGetList results in the first place and and we'd likely race with that query anyway - but it's worth a shot. I'll put this on my list of things to look into, likely around the 3.1 era.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Nothing in that list jumps out as a problem. I take it you've had continued index file deletion failures after disabling Windows Defender? If you could produce that error and run handle against it - if something other than java shows up that might help tell us something.

          If you're still having the problem after excluding cassandra folders from Windows Defender, I'd try to stop the Windows Defender service entirely from within PowerShell as an administrator:

          Set-MpPreference -DisableRealtimeMonitoring $true
          
          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Nothing in that list jumps out as a problem. I take it you've had continued index file deletion failures after disabling Windows Defender? If you could produce that error and run handle against it - if something other than java shows up that might help tell us something. If you're still having the problem after excluding cassandra folders from Windows Defender, I'd try to stop the Windows Defender service entirely from within PowerShell as an administrator: Set-MpPreference -DisableRealtimeMonitoring $true
          Hide
          amichai Amichai Rothman added a comment -

          I'm seeing the same exceptions. Running on a modern.ie virtualbox Windows 7 (32-bit) image with Oracle JDK 1.8.0u31 and a few utilities installed but no antivirus. The issue is recreated easily when running the test project attached to CASSANDRA-8812 (with an added cassandra-all 2.1.2 or 2.1.3 in the pom, with or without the 8812 patch). The errors show up ~90% of the time when running the test. I tried running the 'handle' utility many times during the tests, but all of the cassandra data files show up as being used only by the java process (I also ran the test with 'mvn test -DforkMode=never' to rule out any strange interaction between the maven JVM and the test JVM.) It happens not only with deletions but sometimes also with move/rename:

          Caused by: java.nio.file.FileSystemException: target\embeddedCassandra\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-tmp-ka-9-Index.db -> target\embeddedCassandra\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-9-Index.db: The process cannot access the file because it is being used by another process.
          	at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[na:1.8.0_31]
          	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[na:1.8.0_31]
          	at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[na:1.8.0_31]
          	at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[na:1.8.0_31]
          	at java.nio.file.Files.move(Files.java:1395) ~[na:1.8.0_31]
          	at org.apache.cassandra.io.util.FileUtils.atomicMoveWithFallback(FileUtils.java:184) ~[cassandra-all-2.1.3-SNAPSHOT.jar:2.1.3-SNAPSHOT]
          	at org.apache.cassandra.io.util.FileUtils.renameWithConfirm(FileUtils.java:166) ~[cassandra-all-2.1.3-SNAPSHOT.jar:2.1.3-SNAPSHOT]
          	... 18 common frames omitted
          

          If there's any other info I can provide that might help, or run the test with any patches, it's no problem to recreate. You might want to change the resolution/status.

          Show
          amichai Amichai Rothman added a comment - I'm seeing the same exceptions. Running on a modern.ie virtualbox Windows 7 (32-bit) image with Oracle JDK 1.8.0u31 and a few utilities installed but no antivirus. The issue is recreated easily when running the test project attached to CASSANDRA-8812 (with an added cassandra-all 2.1.2 or 2.1.3 in the pom, with or without the 8812 patch). The errors show up ~90% of the time when running the test. I tried running the 'handle' utility many times during the tests, but all of the cassandra data files show up as being used only by the java process (I also ran the test with 'mvn test -DforkMode=never' to rule out any strange interaction between the maven JVM and the test JVM.) It happens not only with deletions but sometimes also with move/rename: Caused by: java.nio.file.FileSystemException: target\embeddedCassandra\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-tmp-ka-9-Index.db -> target\embeddedCassandra\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-ka-9-Index.db: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[na:1.8.0_31] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[na:1.8.0_31] at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[na:1.8.0_31] at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[na:1.8.0_31] at java.nio.file.Files.move(Files.java:1395) ~[na:1.8.0_31] at org.apache.cassandra.io.util.FileUtils.atomicMoveWithFallback(FileUtils.java:184) ~[cassandra-all-2.1.3-SNAPSHOT.jar:2.1.3-SNAPSHOT] at org.apache.cassandra.io.util.FileUtils.renameWithConfirm(FileUtils.java:166) ~[cassandra-all-2.1.3-SNAPSHOT.jar:2.1.3-SNAPSHOT] ... 18 common frames omitted If there's any other info I can provide that might help, or run the test with any patches, it's no problem to recreate. You might want to change the resolution/status.
          Hide
          amichai Amichai Rothman added a comment -

          Is it possible that the issue isn't really with another process, but just another thread in the same process or some fd leak from not closing/releasing it properly somewhere under some circumstances?

          Show
          amichai Amichai Rothman added a comment - Is it possible that the issue isn't really with another process, but just another thread in the same process or some fd leak from not closing/releasing it properly somewhere under some circumstances?
          Hide
          szymon_kochanski Szymon Kochanski added a comment -

          Reproduced the issue using Cassandra 2.1.2 in Windows 7 and Windows Server 2012 R2 64bits.

          Show
          szymon_kochanski Szymon Kochanski added a comment - Reproduced the issue using Cassandra 2.1.2 in Windows 7 and Windows Server 2012 R2 64bits.
          Hide
          amichai Amichai Rothman added a comment -

          Can someone please reopen this issue? Multiple users can recreate it, but being closed we can't vote for it, and it may be overlooked by devs...

          Show
          amichai Amichai Rothman added a comment - Can someone please reopen this issue? Multiple users can recreate it, but being closed we can't vote for it, and it may be overlooked by devs...
          Hide
          amichai Amichai Rothman added a comment -

          Added recreate-CASSANDRA-8390.tgz test project for your convenience (see env details in previous comment).

          Show
          amichai Amichai Rothman added a comment - Added recreate- CASSANDRA-8390 .tgz test project for your convenience (see env details in previous comment).
          Hide
          amichai Amichai Rothman added a comment -

          fwiw, I just recreated the environment from scratch to make sure no third-party software is causing this:
          1. Installed the Windows 7 VirtualBox VM from scratch (2G RAM, 2 processors).
          2. Installed jdk-8u40-windows-i586.exe.
          3. Installed (extracted) apache-maven-3.2.5.zip.
          4. Added jdk and maven bin folders to system path, and added JAVA_HOME env variable.
          5. Extracted the attached project (converted to zip so it opens natively on Windows).
          6. Ran 'mvn test' from console in project folder.

          Sure enough, the build failed due to this issue.

          Host is Linux 64-bit (probably doesn't matter), with VM image on an NTFS-formatted spinning disk (maybe speed matters).

          Show
          amichai Amichai Rothman added a comment - fwiw, I just recreated the environment from scratch to make sure no third-party software is causing this: 1. Installed the Windows 7 VirtualBox VM from scratch (2G RAM, 2 processors). 2. Installed jdk-8u40-windows-i586.exe. 3. Installed (extracted) apache-maven-3.2.5.zip. 4. Added jdk and maven bin folders to system path, and added JAVA_HOME env variable. 5. Extracted the attached project (converted to zip so it opens natively on Windows). 6. Ran 'mvn test' from console in project folder. Sure enough, the build failed due to this issue. Host is Linux 64-bit (probably doesn't matter), with VM image on an NTFS-formatted spinning disk (maybe speed matters).
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          I've reproduced on modern.ie and locally on dev laptop using attached test, however these errors don't occur on trunk likely due to CASSANDRA-4050 mitigating file sharing problems between readers and writers. With CASSANDRA-8709, CASSANDRA-7066, and CASSANDRA-8893 in the works we will have even further protection against these types of errors in 3.0 on Windows.

          Unfortunately, the RandomAccessFile to FileChannel conversions are invasive enough that we're not back-porting them to the 2.1 line, so we're going to have to sit tight for the 3.0 release to address these errors.

          Closing as duplicate of CASSANDRA-4050, although the symptoms have changed over time with our I/O path.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - I've reproduced on modern.ie and locally on dev laptop using attached test, however these errors don't occur on trunk likely due to CASSANDRA-4050 mitigating file sharing problems between readers and writers. With CASSANDRA-8709 , CASSANDRA-7066 , and CASSANDRA-8893 in the works we will have even further protection against these types of errors in 3.0 on Windows. Unfortunately, the RandomAccessFile to FileChannel conversions are invasive enough that we're not back-porting them to the 2.1 line, so we're going to have to sit tight for the 3.0 release to address these errors. Closing as duplicate of CASSANDRA-4050 , although the symptoms have changed over time with our I/O path.
          Hide
          amichai Amichai Rothman added a comment -

          Is there any possible workaround that a client can use until we get to 3.0?

          Show
          amichai Amichai Rothman added a comment - Is there any possible workaround that a client can use until we get to 3.0?
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Your question got me to thinking - CASSANDRA-8535 is a short-term fix for a similar issue that closes the window of time where we're juggling readers and writers on Windows; the test program on this ticket succeeds without errors if I patch with v2 from that ticket.

          I hope to have that ironed out this week so it should make it into 2.1.4.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Your question got me to thinking - CASSANDRA-8535 is a short-term fix for a similar issue that closes the window of time where we're juggling readers and writers on Windows; the test program on this ticket succeeds without errors if I patch with v2 from that ticket. I hope to have that ironed out this week so it should make it into 2.1.4.
          Hide
          amichai Amichai Rothman added a comment -

          That's wonderful!

          I can confirm that it works - I applied the CASSANDRA-8535 v2 patch, along with the patch from CASSANDRA-8812, to the 2.1.3 tag and the test project which previously failed on both issues now always succeeds.

          Thanks Joshua McKenzie for you work on these issues! Looking forward to a 2.1.4 that works on Windows...

          Until then, I've added my vote to CASSANDRA-8535 for good morale

          Show
          amichai Amichai Rothman added a comment - That's wonderful! I can confirm that it works - I applied the CASSANDRA-8535 v2 patch, along with the patch from CASSANDRA-8812 , to the 2.1.3 tag and the test project which previously failed on both issues now always succeeds. Thanks Joshua McKenzie for you work on these issues! Looking forward to a 2.1.4 that works on Windows... Until then, I've added my vote to CASSANDRA-8535 for good morale
          Hide
          maximp Maxim Podkolzine added a comment - - edited

          We have tested Cassandra 2.1.4 and still see "The process cannot access the file because it is being used by another process" every 2-3 hours.
          Though this time Cassandra does not fail, but just stops gently.
          Does it mean that patches for CASSANDRA-8535, CASSANDRA-8812 and others were not included in 2.1.4?

          UPD: I see from commits that these patches got into 2.1 branch, but not into 2.1.4

          Show
          maximp Maxim Podkolzine added a comment - - edited We have tested Cassandra 2.1.4 and still see "The process cannot access the file because it is being used by another process" every 2-3 hours. Though this time Cassandra does not fail, but just stops gently. Does it mean that patches for CASSANDRA-8535 , CASSANDRA-8812 and others were not included in 2.1.4? UPD: I see from commits that these patches got into 2.1 branch, but not into 2.1.4

            People

            • Assignee:
              JoshuaMcKenzie Joshua McKenzie
              Reporter:
              ilya.komolkin Ilya Komolkin
            • Votes:
              6 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development