Cassandra
  1. Cassandra
  2. CASSANDRA-2307

IndexOutOfBoundsException during compaction

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Debian 5.0.8, Linux 2.6.26-2-amd64, 4GB of ram assigned to Cassandra, JRE 1.6.0_24

      Description

      We're getting an IndexOutOfBounds exception when compacting.

      Here's the detailed error we get on-screen when running "nodetool -h 10.3.133.10 compact":

      Error occured while compacting keyspace test
      java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException
      at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
      at java.util.concurrent.FutureTask.get(Unknown Source)
      at org.apache.cassandra.db.CompactionManager.performMajor(CompactionManager.java:186)
      at org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1678)
      at org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1248)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
      at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
      at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
      at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)
      at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)
      at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)
      at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source)
      at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source)
      at javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown Source)
      at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown Source)
      at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown Source)
      at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown Source)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
      at sun.rmi.transport.Transport$1.run(Unknown Source)
      at java.security.AccessController.doPrivileged(Native Method)
      at sun.rmi.transport.Transport.serviceCall(Unknown Source)
      at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
      at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
      at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)
      Caused by: java.lang.IndexOutOfBoundsException
      at java.nio.Buffer.checkIndex(Unknown Source)
      at java.nio.HeapByteBuffer.getInt(Unknown Source)
      at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.java:57)
      at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:822)
      at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:809)
      at org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:800)
      at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:94)
      at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139)
      at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
      at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
      at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
      at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
      at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
      at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:427)
      at org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:217)
      at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      ... 3 more

      And here's the error I'm getting in my log file:

      ERROR [CompactionExecutor:1] 2011-03-09 19:16:52,299 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[CompactionExecutor:1,1,main]
      java.lang.IndexOutOfBoundsException
      at java.nio.Buffer.checkIndex(Unknown Source)
      at java.nio.HeapByteBuffer.getInt(Unknown Source)
      at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.java:57)
      at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:822)
      at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:809)
      at org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:800)
      at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:94)
      at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139)
      at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
      at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
      at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
      at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
      at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
      at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:427)
      at org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:217)
      at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)

      We run Cassandra 0.7.3, but we had the issue with Cassandra 0.7.2 as well. We have 8 machines in the cluster, the error happens only on one machine, at least for now.

        Activity

        Hide
        Mike Nadeau added a comment -

        I will be happy to attach sstables, I would just need some help determining which files I should upload. Here's the files I have under data:

        seen2-f-242-Data.db
        seen2-f-242-Filter.db
        seen2-f-242-Index.db
        seen2-f-242-Statistics.db
        seen2-f-583-Data.db
        seen2-f-583-Filter.db
        seen2-f-583-Index.db
        seen2-f-583-Statistics.db
        seen2-f-673-Data.db
        seen2-f-673-Filter.db
        seen2-f-673-Index.db
        seen2-f-673-Statistics.db
        seen2-f-694-Data.db
        seen2-f-694-Filter.db
        seen2-f-694-Index.db
        seen2-f-694-Statistics.db
        seen2-f-715-Data.db
        seen2-f-715-Filter.db
        seen2-f-715-Index.db
        seen2-f-715-Statistics.db
        seen2-f-720-Compacted
        seen2-f-720-Data.db
        seen2-f-720-Filter.db
        seen2-f-720-Index.db
        seen2-f-720-Statistics.db
        seen2-f-725-Compacted
        seen2-f-725-Data.db
        seen2-f-725-Filter.db
        seen2-f-725-Index.db
        seen2-f-725-Statistics.db
        seen2-f-726-Compacted
        seen2-f-726-Data.db
        seen2-f-726-Filter.db
        seen2-f-726-Index.db
        seen2-f-726-Statistics.db
        seen2-f-727-Compacted
        seen2-f-727-Data.db
        seen2-f-727-Filter.db
        seen2-f-727-Index.db
        seen2-f-727-Statistics.db
        seen2-f-728-Compacted
        seen2-f-728-Data.db
        seen2-f-728-Filter.db
        seen2-f-728-Index.db
        seen2-f-728-Statistics.db
        seen2-f-729-Compacted
        seen2-f-729-Data.db
        seen2-f-729-Filter.db
        seen2-f-729-Index.db
        seen2-f-729-Statistics.db
        seen2-f-730-Compacted
        seen2-f-730-Data.db
        seen2-f-730-Filter.db
        seen2-f-730-Index.db
        seen2-f-730-Statistics.db
        seen2-f-731-Compacted
        seen2-f-731-Data.db
        seen2-f-731-Filter.db
        seen2-f-731-Index.db
        seen2-f-731-Statistics.db
        seen2-f-732-Compacted
        seen2-f-732-Data.db
        seen2-f-732-Filter.db
        seen2-f-732-Index.db
        seen2-f-732-Statistics.db
        seen2-f-733-Compacted
        seen2-f-733-Data.db
        seen2-f-733-Filter.db
        seen2-f-733-Index.db
        seen2-f-733-Statistics.db
        seen2-f-734-Compacted
        seen2-f-734-Data.db
        seen2-f-734-Filter.db
        seen2-f-734-Index.db
        seen2-f-734-Statistics.db
        seen2-f-735-Compacted
        seen2-f-735-Data.db
        seen2-f-735-Filter.db
        seen2-f-735-Index.db
        seen2-f-735-Statistics.db
        seen2-f-736-Data.db
        seen2-f-736-Filter.db
        seen2-f-736-Index.db
        seen2-f-736-Statistics.db

        Total size is 14G

        Show
        Mike Nadeau added a comment - I will be happy to attach sstables, I would just need some help determining which files I should upload. Here's the files I have under data: seen2-f-242-Data.db seen2-f-242-Filter.db seen2-f-242-Index.db seen2-f-242-Statistics.db seen2-f-583-Data.db seen2-f-583-Filter.db seen2-f-583-Index.db seen2-f-583-Statistics.db seen2-f-673-Data.db seen2-f-673-Filter.db seen2-f-673-Index.db seen2-f-673-Statistics.db seen2-f-694-Data.db seen2-f-694-Filter.db seen2-f-694-Index.db seen2-f-694-Statistics.db seen2-f-715-Data.db seen2-f-715-Filter.db seen2-f-715-Index.db seen2-f-715-Statistics.db seen2-f-720-Compacted seen2-f-720-Data.db seen2-f-720-Filter.db seen2-f-720-Index.db seen2-f-720-Statistics.db seen2-f-725-Compacted seen2-f-725-Data.db seen2-f-725-Filter.db seen2-f-725-Index.db seen2-f-725-Statistics.db seen2-f-726-Compacted seen2-f-726-Data.db seen2-f-726-Filter.db seen2-f-726-Index.db seen2-f-726-Statistics.db seen2-f-727-Compacted seen2-f-727-Data.db seen2-f-727-Filter.db seen2-f-727-Index.db seen2-f-727-Statistics.db seen2-f-728-Compacted seen2-f-728-Data.db seen2-f-728-Filter.db seen2-f-728-Index.db seen2-f-728-Statistics.db seen2-f-729-Compacted seen2-f-729-Data.db seen2-f-729-Filter.db seen2-f-729-Index.db seen2-f-729-Statistics.db seen2-f-730-Compacted seen2-f-730-Data.db seen2-f-730-Filter.db seen2-f-730-Index.db seen2-f-730-Statistics.db seen2-f-731-Compacted seen2-f-731-Data.db seen2-f-731-Filter.db seen2-f-731-Index.db seen2-f-731-Statistics.db seen2-f-732-Compacted seen2-f-732-Data.db seen2-f-732-Filter.db seen2-f-732-Index.db seen2-f-732-Statistics.db seen2-f-733-Compacted seen2-f-733-Data.db seen2-f-733-Filter.db seen2-f-733-Index.db seen2-f-733-Statistics.db seen2-f-734-Compacted seen2-f-734-Data.db seen2-f-734-Filter.db seen2-f-734-Index.db seen2-f-734-Statistics.db seen2-f-735-Compacted seen2-f-735-Data.db seen2-f-735-Filter.db seen2-f-735-Index.db seen2-f-735-Statistics.db seen2-f-736-Data.db seen2-f-736-Filter.db seen2-f-736-Index.db seen2-f-736-Statistics.db Total size is 14G
        Hide
        Sébastien Giroux added a comment -

        Did you try running "nodetool -h localhost scrub" on that node ? I'm not sure if you're running into data corruption but I remember having a similar problem before running scrub, maybe it's something else tho, but worth a try I suppose!

        Show
        Sébastien Giroux added a comment - Did you try running "nodetool -h localhost scrub" on that node ? I'm not sure if you're running into data corruption but I remember having a similar problem before running scrub, maybe it's something else tho, but worth a try I suppose!
        Hide
        Mike Nadeau added a comment -

        I ran into this when running the scrub :

        java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space

        I'm increasing the heap to 10G (it was 4G) and I rerun it right now.

        Show
        Mike Nadeau added a comment - I ran into this when running the scrub : java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space I'm increasing the heap to 10G (it was 4G) and I rerun it right now.
        Hide
        Mike Nadeau added a comment -

        OK the scrub was successful with 10G of heap. Now retrying the compaction. It's the first time I run a scrub, what is it exactly?

        Show
        Mike Nadeau added a comment - OK the scrub was successful with 10G of heap. Now retrying the compaction. It's the first time I run a scrub, what is it exactly?
        Hide
        Mike Nadeau added a comment -

        No luck I'm still getting this error during the compaction:

        Error occured during compaction
        java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at org.apache.cassandra.db.CompactionManager.performMajor(CompactionManager.java:209)
        at org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1720)
        at org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1263)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
        at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)
        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
        at sun.rmi.transport.Transport$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
        Caused by: java.lang.IndexOutOfBoundsException
        at java.nio.Buffer.checkIndex(Unknown Source)
        at java.nio.HeapByteBuffer.getInt(Unknown Source)
        at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.java:57)
        at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:852)
        at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:839)
        at org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:830)
        at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:94)
        at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139)
        at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
        at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
        at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
        at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
        at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
        at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:449)
        at org.apache.cassandra.db.CompactionManager$4.call(CompactionManager.java:240)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        ... 3 more

        I'll be happy to provide SSTables, I just don't know which files would be helpful.

        Show
        Mike Nadeau added a comment - No luck I'm still getting this error during the compaction: Error occured during compaction java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source) at java.util.concurrent.FutureTask.get(Unknown Source) at org.apache.cassandra.db.CompactionManager.performMajor(CompactionManager.java:209) at org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1720) at org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1263) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source) at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) at sun.rmi.transport.Transport$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Unknown Source) at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Unknown Source) at java.nio.HeapByteBuffer.getInt(Unknown Source) at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.java:57) at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:852) at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:839) at org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:830) at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:94) at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:449) at org.apache.cassandra.db.CompactionManager$4.call(CompactionManager.java:240) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) ... 3 more I'll be happy to provide SSTables, I just don't know which files would be helpful.
        Hide
        Sylvain Lebresne added a comment -

        Check the log. Just before the exceptions, you should see a message saying 'Compacting [...]' with a number of -Data.db files. Those are the useful ones. It would be best if you also join with those the filter, index and statistics files (i.e, files with the same number but ending in -Filter.db, -Index.db and -Statistics.db instead of -Data.db)

        Show
        Sylvain Lebresne added a comment - Check the log. Just before the exceptions, you should see a message saying 'Compacting [...] ' with a number of -Data.db files. Those are the useful ones. It would be best if you also join with those the filter, index and statistics files (i.e, files with the same number but ending in -Filter.db, -Index.db and -Statistics.db instead of -Data.db)
        Hide
        Mike Nadeau added a comment -

        I found out what are the files, I'm uploading them somewhere at the moment (it's quite big).

        Do someone know if we can always safely remove the "snapshots" folder? I think it was created by my nodetool repair command (which I interrupted).

        Show
        Mike Nadeau added a comment - I found out what are the files, I'm uploading them somewhere at the moment (it's quite big). Do someone know if we can always safely remove the "snapshots" folder? I think it was created by my nodetool repair command (which I interrupted).
        Hide
        Sylvain Lebresne added a comment -

        Do someone know if we can always safely remove the "snapshots" folder? I think it was created by my nodetool repair command (which I interrupted).

        It's more likely the pre-scrub snapshot. It contains you sstable before the scrub command. It's unclear if the scrub has done something in your case, but I those data are important I would suggest you keep until we figure what is going on with your sstables.

        Show
        Sylvain Lebresne added a comment - Do someone know if we can always safely remove the "snapshots" folder? I think it was created by my nodetool repair command (which I interrupted). It's more likely the pre-scrub snapshot. It contains you sstable before the scrub command. It's unclear if the scrub has done something in your case, but I those data are important I would suggest you keep until we figure what is going on with your sstables.
        Hide
        Mike Nadeau added a comment -

        Here are my SSTables -
        http://184.107.12.190/DATA.tgz

        The archive size is 2.5GB, extracted it's around 12GB.

        By the way my issue might be a bad hard disk... I saw a couple of those exceptions in the log:

        org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0
        And
        Caused by: java.io.IOException: Corrupt (negative) value length encountered

        I moved all my data to a hard disk I know to be healty and restarted Cassandra. I just hope we can repair everything!

        Show
        Mike Nadeau added a comment - Here are my SSTables - http://184.107.12.190/DATA.tgz The archive size is 2.5GB, extracted it's around 12GB. By the way my issue might be a bad hard disk... I saw a couple of those exceptions in the log: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 And Caused by: java.io.IOException: Corrupt (negative) value length encountered I moved all my data to a hard disk I know to be healty and restarted Cassandra. I just hope we can repair everything!
        Hide
        Sylvain Lebresne added a comment -

        2.5GB, ouch

        I'll try to have a look to those sstables. In the meantime, if you have only one node having problem and unless your replication factor is 1, you safest and probably quicker way to go back to a correct state may well be to replace the node with the potentially bad disk.

        Show
        Sylvain Lebresne added a comment - 2.5GB, ouch I'll try to have a look to those sstables. In the meantime, if you have only one node having problem and unless your replication factor is 1, you safest and probably quicker way to go back to a correct state may well be to replace the node with the potentially bad disk.
        Hide
        Mike Nadeau added a comment -

        My replication factor is 3 and I have 7 other healthy nodes.

        Could you help me with the procedure to recover the node? It's the first time I face data corruption with Cassandra. One of my concern is that my corrupted node is my only node with bootstrap=false, and it's seed of all the other nodes. Right now I have only one seed, I never needed more and frankly I'm not sure of the rules to calculate how many seeds a cluster should have.

        Here's my guess for the procedure, please tell me if I have anything wrong:
        1- Stop the corrupted node (let's call it node1)
        2- Change bootstrap to false on one another node (let's call it node2), remove seed
        3- On all other nodes, replace seed (node1) with node2, keep bootstrap to true
        4- Restart node2
        5- Restart all other nodes, they should be fine
        6- Flush node1 data and restart it, it should get its data

        I might be worrying too much with the bootstrap/seed settings, I still don't understand them at 100%.

        Thanks.

        Show
        Mike Nadeau added a comment - My replication factor is 3 and I have 7 other healthy nodes. Could you help me with the procedure to recover the node? It's the first time I face data corruption with Cassandra. One of my concern is that my corrupted node is my only node with bootstrap=false, and it's seed of all the other nodes. Right now I have only one seed, I never needed more and frankly I'm not sure of the rules to calculate how many seeds a cluster should have. Here's my guess for the procedure, please tell me if I have anything wrong: 1- Stop the corrupted node (let's call it node1) 2- Change bootstrap to false on one another node (let's call it node2), remove seed 3- On all other nodes, replace seed (node1) with node2, keep bootstrap to true 4- Restart node2 5- Restart all other nodes, they should be fine 6- Flush node1 data and restart it, it should get its data I might be worrying too much with the bootstrap/seed settings, I still don't understand them at 100%. Thanks.
        Hide
        Sébastien Giroux added a comment -

        org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0

        This is the exception I had BEFORE running scrub. Scrub is supposed to rebuild your sstable while fixing some corruption issues (if any). Are you sure scrub is done running on all problematic sstable when you see this exception ? It can takes a little while to run, especially on large dataset.

        Show
        Sébastien Giroux added a comment - org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 This is the exception I had BEFORE running scrub. Scrub is supposed to rebuild your sstable while fixing some corruption issues (if any). Are you sure scrub is done running on all problematic sstable when you see this exception ? It can takes a little while to run, especially on large dataset.
        Hide
        Mike Nadeau added a comment -

        Yes the scrub finished running, and it seems to have done a lot of stuff.. but I still get the exception.

        I'll go ahead and try to replace the node.

        Show
        Mike Nadeau added a comment - Yes the scrub finished running, and it seems to have done a lot of stuff.. but I still get the exception. I'll go ahead and try to replace the node.
        Hide
        Jonathan Ellis added a comment -

        are you using TTLs in this CF?

        Show
        Jonathan Ellis added a comment - are you using TTLs in this CF?
        Hide
        Mike Nadeau added a comment -

        No TTL.

        It's a very simple CF for which we generate a unique primary key (no timeUUID). And each document is very small, maybe around 300 bytes total in average.

        Show
        Mike Nadeau added a comment - No TTL. It's a very simple CF for which we generate a unique primary key (no timeUUID). And each document is very small, maybe around 300 bytes total in average.
        Hide
        Jonathan Ellis added a comment -

        Are you doing deletes at all in that CF?

        Show
        Jonathan Ellis added a comment - Are you doing deletes at all in that CF?
        Hide
        Mike Nadeau added a comment -

        No deletes.

        Show
        Mike Nadeau added a comment - No deletes.
        Hide
        Jonathan Ellis added a comment -

        Please re-open if you can reproduce on 1.0.10. I suspect this was fixed by something like CASSANDRA-3957.

        Show
        Jonathan Ellis added a comment - Please re-open if you can reproduce on 1.0.10. I suspect this was fixed by something like CASSANDRA-3957 .
        Hide
        Tommi Koivula added a comment -

        We're getting exactly the same error when running a forced major compaction with Cassandra 1.1.0. One of the CFs is producing this error every time. Compaction works for other CFs. There is only one node in this cluster.

        Cassandra version: 1.1.0
        Environment: Ubuntu 12.04, 64-bit Sun JVM 1.6.0_33-b04

        Stack trace:

        java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at com.profium.rdfcassandra.CassandraBridge.compact(Unknown Source)
        at com.profium.rdfcassandra.CassandraBridge.addColumnFamily(Unknown Source)
        at com.profium.rdfcassandra.CassandraBackedIntegerPersistence.<init>(Unknown Source)
        at com.profium.sir.ctx.ReasonerModule.createIntegerPersistence(ReasonerModule.java:149)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at com.profium.ioc.exec.PiocBeanFactory.create(PiocBeanFactory.java:528)
        at com.profium.ioc.exec.PiocBeanFactory.getOrCreate(PiocBeanFactory.java:456)
        at com.profium.ioc.exec.PiocBeanDefinitionExecutor.executeCreate(PiocBeanDefinitionExecutor.java:32)
        at com.profium.ioc.ctx.ApplicationContextFactory.createApplicationContext(ApplicationContextFactory.java:186)
        at com.profium.ioc.ctx.ApplicationContextFactory.initApplicationContext(ApplicationContextFactory.java:382)
        at SIR.exec.SirImpl.doChangeMdsRunlevel(SirImpl.java:799)
        at SIR.exec.SirImpl.raiseToRunlevel(SirImpl.java:685)
        at com.profium.sir.exec.MdsMonitorThread$1.run(MdsMonitorThread.java:84)
        at java.lang.Thread.run(Thread.java:662)
        Caused by: java.lang.IndexOutOfBoundsException
        at java.nio.Buffer.checkIndex(Buffer.java:518)
        at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:340)
        at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.java:50)
        at org.apache.cassandra.db.Column.isMarkedForDelete(Column.java:110)
        at org.apache.cassandra.db.Column.reconcile(Column.java:207)
        at org.apache.cassandra.db.DeletedColumn.reconcile(DeletedColumn.java:58)
        at org.apache.cassandra.db.ArrayBackedSortedColumns.resolveAgainst(ArrayBackedSortedColumns.java:168)
        at org.apache.cassandra.db.ArrayBackedSortedColumns.addAllColumns(ArrayBackedSortedColumns.java:232)
        at org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.addAll(AbstractThreadUnsafeSortedColumns.java:98)
        at org.apache.cassandra.db.AbstractColumnContainer.addAll(AbstractColumnContainer.java:92)
        at org.apache.cassandra.db.AbstractColumnContainer.addAll(AbstractColumnContainer.java:97)
        at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:126)
        at org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:99)
        at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:145)
        at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:97)
        at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:82)
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:177)
        at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:360)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        ... 1 more

        Show
        Tommi Koivula added a comment - We're getting exactly the same error when running a forced major compaction with Cassandra 1.1.0. One of the CFs is producing this error every time. Compaction works for other CFs. There is only one node in this cluster. Cassandra version: 1.1.0 Environment: Ubuntu 12.04, 64-bit Sun JVM 1.6.0_33-b04 Stack trace: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at com.profium.rdfcassandra.CassandraBridge.compact(Unknown Source) at com.profium.rdfcassandra.CassandraBridge.addColumnFamily(Unknown Source) at com.profium.rdfcassandra.CassandraBackedIntegerPersistence.<init>(Unknown Source) at com.profium.sir.ctx.ReasonerModule.createIntegerPersistence(ReasonerModule.java:149) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.profium.ioc.exec.PiocBeanFactory.create(PiocBeanFactory.java:528) at com.profium.ioc.exec.PiocBeanFactory.getOrCreate(PiocBeanFactory.java:456) at com.profium.ioc.exec.PiocBeanDefinitionExecutor.executeCreate(PiocBeanDefinitionExecutor.java:32) at com.profium.ioc.ctx.ApplicationContextFactory.createApplicationContext(ApplicationContextFactory.java:186) at com.profium.ioc.ctx.ApplicationContextFactory.initApplicationContext(ApplicationContextFactory.java:382) at SIR.exec.SirImpl.doChangeMdsRunlevel(SirImpl.java:799) at SIR.exec.SirImpl.raiseToRunlevel(SirImpl.java:685) at com.profium.sir.exec.MdsMonitorThread$1.run(MdsMonitorThread.java:84) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Buffer.java:518) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:340) at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.java:50) at org.apache.cassandra.db.Column.isMarkedForDelete(Column.java:110) at org.apache.cassandra.db.Column.reconcile(Column.java:207) at org.apache.cassandra.db.DeletedColumn.reconcile(DeletedColumn.java:58) at org.apache.cassandra.db.ArrayBackedSortedColumns.resolveAgainst(ArrayBackedSortedColumns.java:168) at org.apache.cassandra.db.ArrayBackedSortedColumns.addAllColumns(ArrayBackedSortedColumns.java:232) at org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.addAll(AbstractThreadUnsafeSortedColumns.java:98) at org.apache.cassandra.db.AbstractColumnContainer.addAll(AbstractColumnContainer.java:92) at org.apache.cassandra.db.AbstractColumnContainer.addAll(AbstractColumnContainer.java:97) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:126) at org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:99) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:145) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:97) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:82) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:177) at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:360) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ... 1 more
        Hide
        Jonathan Ellis added a comment -

        Commenting on a bug filed against a two-year-old release, for an eight-month-old .0 version, is probably not the best way to address this.

        If you can reproduce with 1.1.8 then please open a new ticket. Standard advice of running scrub first applies as above.

        Show
        Jonathan Ellis added a comment - Commenting on a bug filed against a two-year-old release, for an eight-month-old .0 version, is probably not the best way to address this. If you can reproduce with 1.1.8 then please open a new ticket. Standard advice of running scrub first applies as above.

          People

          • Assignee:
            Unassigned
            Reporter:
            Mike Nadeau
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development