Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-6716

snapshots constantly fail with "Tried to hard link to file that does not exist"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Cannot Reproduce
    • None
    • None
    • None
    • Cassandra 2.0.5 (built from source), Linux, 6 nodes, JDK 1.7

    • Normal

    Description

      It seems that since recently I have started getting a number of exceptions like "File not found" on all Cassandra nodes. Currently I am getting an exception like this every couple of seconds on each node, for different keyspaces and CFs.

      I have tried to restart the nodes, tried to scrub them. No luck so far. It seems that scrub cannot complete on any of these nodes, at some point it fails because of the file that it can't find.

      One one of the nodes currently the "nodetool scrub" command fails instantly and consistently with this exception:

      # /opt/cassandra/bin/nodetool scrub 
      Exception in thread "main" java.lang.RuntimeException: Tried to hard link to file that does not exist /mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28049-Data.db
      	at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75)
      	at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1215)
      	at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1826)
      	at org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:1122)
      	at org.apache.cassandra.service.StorageService.scrub(StorageService.java:2159)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
      	at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
      	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
      	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
      	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
      	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
      	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
      	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
      	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
      	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
      	at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
      	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
      	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
      	at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
      	at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
      	at sun.rmi.transport.Transport$1.run(Transport.java:177)
      	at sun.rmi.transport.Transport$1.run(Transport.java:174)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
      	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
      	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
      	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:724)
      

      Also I have noticed that the files that are missing are often (or maybe always?) referred to in the log as follows:

      WARN 00:06:10,597 At level 3, SSTableReader(path='/mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-26776-Data.db') [DecoratedKey(-9053060597280257896, 0010f582cddaca974d7198ae30f194ccfd0c00001000000000004b818d00000000000000010000100000000000004000000000000000000300), DecoratedKey(-8855915848970248008, 00103ce153dfeeb547fb881a51adf611f6cf0000100000000000f04f4700000000000000010000100000000000004000000000000000000500)] overlaps SSTableReader(path='/mnt/disk2/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28022-Data.db') [DecoratedKey(-8964446543595889729, 001043214a8bdcfd46a3b8ea71da2d57bb9a0000100000000001117c0d00000000000000000000100000000000004000000000000000000100), DecoratedKey(-8848132752710859808, 0010d1f6de8039d54218bf5b1e184335df5f000010000000000062e52600000000000000010000100000000000004000000000000000000400)]. This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the fact that you have dropped sstables from another node into the data directory. Sending back to L0. If you didn't drop in sstables, and have not yet run scrub, you should do so since you may also have rows out-of-order within an sstable
      WARN [RMI TCP Connection(2)-10.3.45.158] 2014-02-18 00:06:10,597 LeveledManifest.java (line 171) At level 3, SSTableReader(path='/mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-26776-Data.db') [DecoratedKey(-9053060597280257896, 0010f582cddaca974d7198ae30f194ccfd0c00001000000000004b818d00000000000000010000100000000000004000000000000000000300), DecoratedKey(-8855915848970248008, 00103ce153dfeeb547fb881a51adf611f6cf0000100000000000f04f4700000000000000010000100000000000004000000000000000000500)] overlaps SSTableReader(path='/mnt/disk2/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28022-Data.db') [DecoratedKey(-8964446543595889729, 001043214a8bdcfd46a3b8ea71da2d57bb9a0000100000000001117c0d00000000000000000000100000000000004000000000000000000100), DecoratedKey(-8848132752710859808, 0010d1f6de8039d54218bf5b1e184335df5f000010000000000062e52600000000000000010000100000000000004000000000000000000400)]. This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the fact that you have dropped sstables from another node into the data directory. Sending back to L0. If you didn't drop in sstables, and have not yet run scrub, you should do so since you may also have rows out-of-order within an sstable

      I never had anything but Cassandra 2.0 on these systems. Also I have recreated my test data from scratch with 2.0.4.

      Attachments

        1. system.log.gz
          729 kB
          Nikolai Grigoriev

        Activity

          People

            Unassigned Unassigned
            ngrigoriev Nikolai Grigoriev
            Votes:
            3 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: