Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-2307

IndexOutOfBoundsException during compaction

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Debian 5.0.8, Linux 2.6.26-2-amd64, 4GB of ram assigned to Cassandra, JRE 1.6.0_24

      Description

      We're getting an IndexOutOfBounds exception when compacting.

      Here's the detailed error we get on-screen when running "nodetool -h 10.3.133.10 compact":

      Error occured while compacting keyspace test
      java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException
      at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
      at java.util.concurrent.FutureTask.get(Unknown Source)
      at org.apache.cassandra.db.CompactionManager.performMajor(CompactionManager.java:186)
      at org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1678)
      at org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1248)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
      at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
      at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
      at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)
      at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)
      at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)
      at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source)
      at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source)
      at javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown Source)
      at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown Source)
      at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown Source)
      at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown Source)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
      at sun.rmi.transport.Transport$1.run(Unknown Source)
      at java.security.AccessController.doPrivileged(Native Method)
      at sun.rmi.transport.Transport.serviceCall(Unknown Source)
      at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
      at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
      at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)
      Caused by: java.lang.IndexOutOfBoundsException
      at java.nio.Buffer.checkIndex(Unknown Source)
      at java.nio.HeapByteBuffer.getInt(Unknown Source)
      at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.java:57)
      at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:822)
      at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:809)
      at org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:800)
      at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:94)
      at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139)
      at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
      at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
      at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
      at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
      at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
      at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:427)
      at org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:217)
      at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      ... 3 more

      And here's the error I'm getting in my log file:

      ERROR [CompactionExecutor:1] 2011-03-09 19:16:52,299 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[CompactionExecutor:1,1,main]
      java.lang.IndexOutOfBoundsException
      at java.nio.Buffer.checkIndex(Unknown Source)
      at java.nio.HeapByteBuffer.getInt(Unknown Source)
      at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.java:57)
      at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:822)
      at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:809)
      at org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:800)
      at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:94)
      at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139)
      at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
      at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
      at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
      at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
      at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
      at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:427)
      at org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:217)
      at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)

      We run Cassandra 0.7.3, but we had the issue with Cassandra 0.7.2 as well. We have 8 machines in the cluster, the error happens only on one machine, at least for now.

        Activity

        Hide
        akaris Mike Nadeau added a comment -

        I will be happy to attach sstables, I would just need some help determining which files I should upload. Here's the files I have under data:

        seen2-f-242-Data.db
        seen2-f-242-Filter.db
        seen2-f-242-Index.db
        seen2-f-242-Statistics.db
        seen2-f-583-Data.db
        seen2-f-583-Filter.db
        seen2-f-583-Index.db
        seen2-f-583-Statistics.db
        seen2-f-673-Data.db
        seen2-f-673-Filter.db
        seen2-f-673-Index.db
        seen2-f-673-Statistics.db
        seen2-f-694-Data.db
        seen2-f-694-Filter.db
        seen2-f-694-Index.db
        seen2-f-694-Statistics.db
        seen2-f-715-Data.db
        seen2-f-715-Filter.db
        seen2-f-715-Index.db
        seen2-f-715-Statistics.db
        seen2-f-720-Compacted
        seen2-f-720-Data.db
        seen2-f-720-Filter.db
        seen2-f-720-Index.db
        seen2-f-720-Statistics.db
        seen2-f-725-Compacted
        seen2-f-725-Data.db
        seen2-f-725-Filter.db
        seen2-f-725-Index.db
        seen2-f-725-Statistics.db
        seen2-f-726-Compacted
        seen2-f-726-Data.db
        seen2-f-726-Filter.db
        seen2-f-726-Index.db
        seen2-f-726-Statistics.db
        seen2-f-727-Compacted
        seen2-f-727-Data.db
        seen2-f-727-Filter.db
        seen2-f-727-Index.db
        seen2-f-727-Statistics.db
        seen2-f-728-Compacted
        seen2-f-728-Data.db
        seen2-f-728-Filter.db
        seen2-f-728-Index.db
        seen2-f-728-Statistics.db
        seen2-f-729-Compacted
        seen2-f-729-Data.db
        seen2-f-729-Filter.db
        seen2-f-729-Index.db
        seen2-f-729-Statistics.db
        seen2-f-730-Compacted
        seen2-f-730-Data.db
        seen2-f-730-Filter.db
        seen2-f-730-Index.db
        seen2-f-730-Statistics.db
        seen2-f-731-Compacted
        seen2-f-731-Data.db
        seen2-f-731-Filter.db
        seen2-f-731-Index.db
        seen2-f-731-Statistics.db
        seen2-f-732-Compacted
        seen2-f-732-Data.db
        seen2-f-732-Filter.db
        seen2-f-732-Index.db
        seen2-f-732-Statistics.db
        seen2-f-733-Compacted
        seen2-f-733-Data.db
        seen2-f-733-Filter.db
        seen2-f-733-Index.db
        seen2-f-733-Statistics.db
        seen2-f-734-Compacted
        seen2-f-734-Data.db
        seen2-f-734-Filter.db
        seen2-f-734-Index.db
        seen2-f-734-Statistics.db
        seen2-f-735-Compacted
        seen2-f-735-Data.db
        seen2-f-735-Filter.db
        seen2-f-735-Index.db
        seen2-f-735-Statistics.db
        seen2-f-736-Data.db
        seen2-f-736-Filter.db
        seen2-f-736-Index.db
        seen2-f-736-Statistics.db

        Total size is 14G

        Show
        akaris Mike Nadeau added a comment - I will be happy to attach sstables, I would just need some help determining which files I should upload. Here's the files I have under data: seen2-f-242-Data.db seen2-f-242-Filter.db seen2-f-242-Index.db seen2-f-242-Statistics.db seen2-f-583-Data.db seen2-f-583-Filter.db seen2-f-583-Index.db seen2-f-583-Statistics.db seen2-f-673-Data.db seen2-f-673-Filter.db seen2-f-673-Index.db seen2-f-673-Statistics.db seen2-f-694-Data.db seen2-f-694-Filter.db seen2-f-694-Index.db seen2-f-694-Statistics.db seen2-f-715-Data.db seen2-f-715-Filter.db seen2-f-715-Index.db seen2-f-715-Statistics.db seen2-f-720-Compacted seen2-f-720-Data.db seen2-f-720-Filter.db seen2-f-720-Index.db seen2-f-720-Statistics.db seen2-f-725-Compacted seen2-f-725-Data.db seen2-f-725-Filter.db seen2-f-725-Index.db seen2-f-725-Statistics.db seen2-f-726-Compacted seen2-f-726-Data.db seen2-f-726-Filter.db seen2-f-726-Index.db seen2-f-726-Statistics.db seen2-f-727-Compacted seen2-f-727-Data.db seen2-f-727-Filter.db seen2-f-727-Index.db seen2-f-727-Statistics.db seen2-f-728-Compacted seen2-f-728-Data.db seen2-f-728-Filter.db seen2-f-728-Index.db seen2-f-728-Statistics.db seen2-f-729-Compacted seen2-f-729-Data.db seen2-f-729-Filter.db seen2-f-729-Index.db seen2-f-729-Statistics.db seen2-f-730-Compacted seen2-f-730-Data.db seen2-f-730-Filter.db seen2-f-730-Index.db seen2-f-730-Statistics.db seen2-f-731-Compacted seen2-f-731-Data.db seen2-f-731-Filter.db seen2-f-731-Index.db seen2-f-731-Statistics.db seen2-f-732-Compacted seen2-f-732-Data.db seen2-f-732-Filter.db seen2-f-732-Index.db seen2-f-732-Statistics.db seen2-f-733-Compacted seen2-f-733-Data.db seen2-f-733-Filter.db seen2-f-733-Index.db seen2-f-733-Statistics.db seen2-f-734-Compacted seen2-f-734-Data.db seen2-f-734-Filter.db seen2-f-734-Index.db seen2-f-734-Statistics.db seen2-f-735-Compacted seen2-f-735-Data.db seen2-f-735-Filter.db seen2-f-735-Index.db seen2-f-735-Statistics.db seen2-f-736-Data.db seen2-f-736-Filter.db seen2-f-736-Index.db seen2-f-736-Statistics.db Total size is 14G
        Hide
        wajam Sébastien Giroux added a comment -

        Did you try running "nodetool -h localhost scrub" on that node ? I'm not sure if you're running into data corruption but I remember having a similar problem before running scrub, maybe it's something else tho, but worth a try I suppose!

        Show
        wajam Sébastien Giroux added a comment - Did you try running "nodetool -h localhost scrub" on that node ? I'm not sure if you're running into data corruption but I remember having a similar problem before running scrub, maybe it's something else tho, but worth a try I suppose!
        Hide
        akaris Mike Nadeau added a comment -

        I ran into this when running the scrub :

        java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space

        I'm increasing the heap to 10G (it was 4G) and I rerun it right now.

        Show
        akaris Mike Nadeau added a comment - I ran into this when running the scrub : java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space I'm increasing the heap to 10G (it was 4G) and I rerun it right now.
        Hide
        akaris Mike Nadeau added a comment -

        OK the scrub was successful with 10G of heap. Now retrying the compaction. It's the first time I run a scrub, what is it exactly?

        Show
        akaris Mike Nadeau added a comment - OK the scrub was successful with 10G of heap. Now retrying the compaction. It's the first time I run a scrub, what is it exactly?
        Hide
        akaris Mike Nadeau added a comment -

        No luck I'm still getting this error during the compaction:

        Error occured during compaction
        java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at org.apache.cassandra.db.CompactionManager.performMajor(CompactionManager.java:209)
        at org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1720)
        at org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1263)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
        at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)
        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
        at sun.rmi.transport.Transport$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
        Caused by: java.lang.IndexOutOfBoundsException
        at java.nio.Buffer.checkIndex(Unknown Source)
        at java.nio.HeapByteBuffer.getInt(Unknown Source)
        at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.java:57)
        at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:852)
        at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:839)
        at org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:830)
        at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:94)
        at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139)
        at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
        at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
        at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
        at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
        at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
        at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:449)
        at org.apache.cassandra.db.CompactionManager$4.call(CompactionManager.java:240)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        ... 3 more

        I'll be happy to provide SSTables, I just don't know which files would be helpful.

        Show
        akaris Mike Nadeau added a comment - No luck I'm still getting this error during the compaction: Error occured during compaction java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source) at java.util.concurrent.FutureTask.get(Unknown Source) at org.apache.cassandra.db.CompactionManager.performMajor(CompactionManager.java:209) at org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1720) at org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1263) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source) at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) at sun.rmi.transport.Transport$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Unknown Source) at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Unknown Source) at java.nio.HeapByteBuffer.getInt(Unknown Source) at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.java:57) at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:852) at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:839) at org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:830) at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:94) at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:449) at org.apache.cassandra.db.CompactionManager$4.call(CompactionManager.java:240) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) ... 3 more I'll be happy to provide SSTables, I just don't know which files would be helpful.
        Hide
        slebresne Sylvain Lebresne added a comment -

        Check the log. Just before the exceptions, you should see a message saying 'Compacting [...]' with a number of -Data.db files. Those are the useful ones. It would be best if you also join with those the filter, index and statistics files (i.e, files with the same number but ending in -Filter.db, -Index.db and -Statistics.db instead of -Data.db)

        Show
        slebresne Sylvain Lebresne added a comment - Check the log. Just before the exceptions, you should see a message saying 'Compacting [...] ' with a number of -Data.db files. Those are the useful ones. It would be best if you also join with those the filter, index and statistics files (i.e, files with the same number but ending in -Filter.db, -Index.db and -Statistics.db instead of -Data.db)
        Hide
        akaris Mike Nadeau added a comment -

        I found out what are the files, I'm uploading them somewhere at the moment (it's quite big).

        Do someone know if we can always safely remove the "snapshots" folder? I think it was created by my nodetool repair command (which I interrupted).

        Show
        akaris Mike Nadeau added a comment - I found out what are the files, I'm uploading them somewhere at the moment (it's quite big). Do someone know if we can always safely remove the "snapshots" folder? I think it was created by my nodetool repair command (which I interrupted).
        Hide
        slebresne Sylvain Lebresne added a comment -

        Do someone know if we can always safely remove the "snapshots" folder? I think it was created by my nodetool repair command (which I interrupted).

        It's more likely the pre-scrub snapshot. It contains you sstable before the scrub command. It's unclear if the scrub has done something in your case, but I those data are important I would suggest you keep until we figure what is going on with your sstables.

        Show
        slebresne Sylvain Lebresne added a comment - Do someone know if we can always safely remove the "snapshots" folder? I think it was created by my nodetool repair command (which I interrupted). It's more likely the pre-scrub snapshot. It contains you sstable before the scrub command. It's unclear if the scrub has done something in your case, but I those data are important I would suggest you keep until we figure what is going on with your sstables.
        Hide
        akaris Mike Nadeau added a comment -

        Here are my SSTables -
        http://184.107.12.190/DATA.tgz

        The archive size is 2.5GB, extracted it's around 12GB.

        By the way my issue might be a bad hard disk... I saw a couple of those exceptions in the log:

        org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0
        And
        Caused by: java.io.IOException: Corrupt (negative) value length encountered

        I moved all my data to a hard disk I know to be healty and restarted Cassandra. I just hope we can repair everything!

        Show
        akaris Mike Nadeau added a comment - Here are my SSTables - http://184.107.12.190/DATA.tgz The archive size is 2.5GB, extracted it's around 12GB. By the way my issue might be a bad hard disk... I saw a couple of those exceptions in the log: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 And Caused by: java.io.IOException: Corrupt (negative) value length encountered I moved all my data to a hard disk I know to be healty and restarted Cassandra. I just hope we can repair everything!
        Hide
        slebresne Sylvain Lebresne added a comment -

        2.5GB, ouch

        I'll try to have a look to those sstables. In the meantime, if you have only one node having problem and unless your replication factor is 1, you safest and probably quicker way to go back to a correct state may well be to replace the node with the potentially bad disk.

        Show
        slebresne Sylvain Lebresne added a comment - 2.5GB, ouch I'll try to have a look to those sstables. In the meantime, if you have only one node having problem and unless your replication factor is 1, you safest and probably quicker way to go back to a correct state may well be to replace the node with the potentially bad disk.
        Hide
        akaris Mike Nadeau added a comment -

        My replication factor is 3 and I have 7 other healthy nodes.

        Could you help me with the procedure to recover the node? It's the first time I face data corruption with Cassandra. One of my concern is that my corrupted node is my only node with bootstrap=false, and it's seed of all the other nodes. Right now I have only one seed, I never needed more and frankly I'm not sure of the rules to calculate how many seeds a cluster should have.

        Here's my guess for the procedure, please tell me if I have anything wrong:
        1- Stop the corrupted node (let's call it node1)
        2- Change bootstrap to false on one another node (let's call it node2), remove seed
        3- On all other nodes, replace seed (node1) with node2, keep bootstrap to true
        4- Restart node2
        5- Restart all other nodes, they should be fine
        6- Flush node1 data and restart it, it should get its data

        I might be worrying too much with the bootstrap/seed settings, I still don't understand them at 100%.

        Thanks.

        Show
        akaris Mike Nadeau added a comment - My replication factor is 3 and I have 7 other healthy nodes. Could you help me with the procedure to recover the node? It's the first time I face data corruption with Cassandra. One of my concern is that my corrupted node is my only node with bootstrap=false, and it's seed of all the other nodes. Right now I have only one seed, I never needed more and frankly I'm not sure of the rules to calculate how many seeds a cluster should have. Here's my guess for the procedure, please tell me if I have anything wrong: 1- Stop the corrupted node (let's call it node1) 2- Change bootstrap to false on one another node (let's call it node2), remove seed 3- On all other nodes, replace seed (node1) with node2, keep bootstrap to true 4- Restart node2 5- Restart all other nodes, they should be fine 6- Flush node1 data and restart it, it should get its data I might be worrying too much with the bootstrap/seed settings, I still don't understand them at 100%. Thanks.
        Hide
        wajam Sébastien Giroux added a comment -

        org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0

        This is the exception I had BEFORE running scrub. Scrub is supposed to rebuild your sstable while fixing some corruption issues (if any). Are you sure scrub is done running on all problematic sstable when you see this exception ? It can takes a little while to run, especially on large dataset.

        Show
        wajam Sébastien Giroux added a comment - org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 This is the exception I had BEFORE running scrub. Scrub is supposed to rebuild your sstable while fixing some corruption issues (if any). Are you sure scrub is done running on all problematic sstable when you see this exception ? It can takes a little while to run, especially on large dataset.
        Hide
        akaris Mike Nadeau added a comment -

        Yes the scrub finished running, and it seems to have done a lot of stuff.. but I still get the exception.

        I'll go ahead and try to replace the node.

        Show
        akaris Mike Nadeau added a comment - Yes the scrub finished running, and it seems to have done a lot of stuff.. but I still get the exception. I'll go ahead and try to replace the node.
        Hide
        jbellis Jonathan Ellis added a comment -

        are you using TTLs in this CF?

        Show
        jbellis Jonathan Ellis added a comment - are you using TTLs in this CF?
        Hide
        akaris Mike Nadeau added a comment -

        No TTL.

        It's a very simple CF for which we generate a unique primary key (no timeUUID). And each document is very small, maybe around 300 bytes total in average.

        Show
        akaris Mike Nadeau added a comment - No TTL. It's a very simple CF for which we generate a unique primary key (no timeUUID). And each document is very small, maybe around 300 bytes total in average.
        Hide
        jbellis Jonathan Ellis added a comment -

        Are you doing deletes at all in that CF?

        Show
        jbellis Jonathan Ellis added a comment - Are you doing deletes at all in that CF?
        Hide
        akaris Mike Nadeau added a comment -

        No deletes.

        Show
        akaris Mike Nadeau added a comment - No deletes.
        Hide
        jbellis Jonathan Ellis added a comment -

        Please re-open if you can reproduce on 1.0.10. I suspect this was fixed by something like CASSANDRA-3957.

        Show
        jbellis Jonathan Ellis added a comment - Please re-open if you can reproduce on 1.0.10. I suspect this was fixed by something like CASSANDRA-3957 .
        Hide
        tikoivul Tommi Koivula added a comment -

        We're getting exactly the same error when running a forced major compaction with Cassandra 1.1.0. One of the CFs is producing this error every time. Compaction works for other CFs. There is only one node in this cluster.

        Cassandra version: 1.1.0
        Environment: Ubuntu 12.04, 64-bit Sun JVM 1.6.0_33-b04

        Stack trace:

        java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at com.profium.rdfcassandra.CassandraBridge.compact(Unknown Source)
        at com.profium.rdfcassandra.CassandraBridge.addColumnFamily(Unknown Source)
        at com.profium.rdfcassandra.CassandraBackedIntegerPersistence.<init>(Unknown Source)
        at com.profium.sir.ctx.ReasonerModule.createIntegerPersistence(ReasonerModule.java:149)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at com.profium.ioc.exec.PiocBeanFactory.create(PiocBeanFactory.java:528)
        at com.profium.ioc.exec.PiocBeanFactory.getOrCreate(PiocBeanFactory.java:456)
        at com.profium.ioc.exec.PiocBeanDefinitionExecutor.executeCreate(PiocBeanDefinitionExecutor.java:32)
        at com.profium.ioc.ctx.ApplicationContextFactory.createApplicationContext(ApplicationContextFactory.java:186)
        at com.profium.ioc.ctx.ApplicationContextFactory.initApplicationContext(ApplicationContextFactory.java:382)
        at SIR.exec.SirImpl.doChangeMdsRunlevel(SirImpl.java:799)
        at SIR.exec.SirImpl.raiseToRunlevel(SirImpl.java:685)
        at com.profium.sir.exec.MdsMonitorThread$1.run(MdsMonitorThread.java:84)
        at java.lang.Thread.run(Thread.java:662)
        Caused by: java.lang.IndexOutOfBoundsException
        at java.nio.Buffer.checkIndex(Buffer.java:518)
        at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:340)
        at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.java:50)
        at org.apache.cassandra.db.Column.isMarkedForDelete(Column.java:110)
        at org.apache.cassandra.db.Column.reconcile(Column.java:207)
        at org.apache.cassandra.db.DeletedColumn.reconcile(DeletedColumn.java:58)
        at org.apache.cassandra.db.ArrayBackedSortedColumns.resolveAgainst(ArrayBackedSortedColumns.java:168)
        at org.apache.cassandra.db.ArrayBackedSortedColumns.addAllColumns(ArrayBackedSortedColumns.java:232)
        at org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.addAll(AbstractThreadUnsafeSortedColumns.java:98)
        at org.apache.cassandra.db.AbstractColumnContainer.addAll(AbstractColumnContainer.java:92)
        at org.apache.cassandra.db.AbstractColumnContainer.addAll(AbstractColumnContainer.java:97)
        at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:126)
        at org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:99)
        at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:145)
        at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:97)
        at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:82)
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:177)
        at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:360)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        ... 1 more

        Show
        tikoivul Tommi Koivula added a comment - We're getting exactly the same error when running a forced major compaction with Cassandra 1.1.0. One of the CFs is producing this error every time. Compaction works for other CFs. There is only one node in this cluster. Cassandra version: 1.1.0 Environment: Ubuntu 12.04, 64-bit Sun JVM 1.6.0_33-b04 Stack trace: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at com.profium.rdfcassandra.CassandraBridge.compact(Unknown Source) at com.profium.rdfcassandra.CassandraBridge.addColumnFamily(Unknown Source) at com.profium.rdfcassandra.CassandraBackedIntegerPersistence.<init>(Unknown Source) at com.profium.sir.ctx.ReasonerModule.createIntegerPersistence(ReasonerModule.java:149) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.profium.ioc.exec.PiocBeanFactory.create(PiocBeanFactory.java:528) at com.profium.ioc.exec.PiocBeanFactory.getOrCreate(PiocBeanFactory.java:456) at com.profium.ioc.exec.PiocBeanDefinitionExecutor.executeCreate(PiocBeanDefinitionExecutor.java:32) at com.profium.ioc.ctx.ApplicationContextFactory.createApplicationContext(ApplicationContextFactory.java:186) at com.profium.ioc.ctx.ApplicationContextFactory.initApplicationContext(ApplicationContextFactory.java:382) at SIR.exec.SirImpl.doChangeMdsRunlevel(SirImpl.java:799) at SIR.exec.SirImpl.raiseToRunlevel(SirImpl.java:685) at com.profium.sir.exec.MdsMonitorThread$1.run(MdsMonitorThread.java:84) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Buffer.java:518) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:340) at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.java:50) at org.apache.cassandra.db.Column.isMarkedForDelete(Column.java:110) at org.apache.cassandra.db.Column.reconcile(Column.java:207) at org.apache.cassandra.db.DeletedColumn.reconcile(DeletedColumn.java:58) at org.apache.cassandra.db.ArrayBackedSortedColumns.resolveAgainst(ArrayBackedSortedColumns.java:168) at org.apache.cassandra.db.ArrayBackedSortedColumns.addAllColumns(ArrayBackedSortedColumns.java:232) at org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.addAll(AbstractThreadUnsafeSortedColumns.java:98) at org.apache.cassandra.db.AbstractColumnContainer.addAll(AbstractColumnContainer.java:92) at org.apache.cassandra.db.AbstractColumnContainer.addAll(AbstractColumnContainer.java:97) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:126) at org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:99) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:145) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:97) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:82) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:177) at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:360) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ... 1 more
        Hide
        jbellis Jonathan Ellis added a comment -

        Commenting on a bug filed against a two-year-old release, for an eight-month-old .0 version, is probably not the best way to address this.

        If you can reproduce with 1.1.8 then please open a new ticket. Standard advice of running scrub first applies as above.

        Show
        jbellis Jonathan Ellis added a comment - Commenting on a bug filed against a two-year-old release, for an eight-month-old .0 version, is probably not the best way to address this. If you can reproduce with 1.1.8 then please open a new ticket. Standard advice of running scrub first applies as above.

          People

          • Assignee:
            Unassigned
            Reporter:
            akaris Mike Nadeau
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development