Cassandra
  1. Cassandra
  2. CASSANDRA-5125

Support indexes on composite column components (clustered columns)

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 2.0 beta 1
    • Component/s: Core
    • Labels:
      None

      Description

      Given

      CREATE TABLE foo (
        a int,
        b int,
        c int,
        PRIMARY KEY (a, b)
      );
      

      We should support CREATE INDEX ON foo(b).

      1. 0004-Handle-partition-key-indexing.txt
        16 kB
        Sylvain Lebresne
      2. 0003-Handle-new-type-of-IndexExpression.txt
        39 kB
        Sylvain Lebresne
      3. 0002-Generalize-CompositeIndex-for-all-column-type.txt
        53 kB
        Sylvain Lebresne
      4. 0001-Refactor-aliases-into-column_metadata.txt
        108 kB
        Sylvain Lebresne

        Issue Links

          Activity

          Hide
          Sylvain Lebresne added a comment -

          Attaching patches for that (also pushed to https://github.com/pcmanus/cassandra/commits/5125).

          The first part of this ticket is about how we store the information that a clustering key column is indexed. Turns out that for "regular" columns we use ColumnDefinition and the indexing code also assumes that, so the probably best and simplest approach is to reuse ColumnDefinition for that too. But then it's easier to always store all primary key columns as ColumnDefinition, pretty much obsoleting the old key_aliases and column_aliases. There is a few related details worth noticing:

          1. while this obsolete the aliases, those are not removed of the schema by the patch for compatibility sake. Truth is, I'm not sure there is a way to remove a field from the schema without breaking rolling upgrades at this point.
          2. after this patch, CFDefinition becomes much less useful as CFMetadata + ColumnDefinition holds pretty much the same information in pretty much the same form. So we could slightly simplify things by removing CFDefinition. However, this is left to later (this won't be a 3 lines patch).

          After that, the patch adds a new type of composite indexes to handle indexing clustering keys (which share most code with the existing regular composite index) and update CQL3 to allow adding and querying the new indexes (in particular, it is slighty tricky in SelectStatment to recognize when a clustering key is restricted if 2ndary indexes should be used or not).

          The last patch adds support for indexing components of the partition key (we don't allow indexing the first component of the partition key as it makes no sense (it's already primary indexed), but if the partition key is composite, secondary indexing the 2+ parts can be useful).

          Lastly, I'll note that the patches only add theses news indexes for non compact tables. We should generalize to compact tables too, but that would require a bit of generalization that I'd rather add in a second phase.

          Show
          Sylvain Lebresne added a comment - Attaching patches for that (also pushed to https://github.com/pcmanus/cassandra/commits/5125 ). The first part of this ticket is about how we store the information that a clustering key column is indexed. Turns out that for "regular" columns we use ColumnDefinition and the indexing code also assumes that, so the probably best and simplest approach is to reuse ColumnDefinition for that too. But then it's easier to always store all primary key columns as ColumnDefinition, pretty much obsoleting the old key_aliases and column_aliases. There is a few related details worth noticing: while this obsolete the aliases, those are not removed of the schema by the patch for compatibility sake. Truth is, I'm not sure there is a way to remove a field from the schema without breaking rolling upgrades at this point. after this patch, CFDefinition becomes much less useful as CFMetadata + ColumnDefinition holds pretty much the same information in pretty much the same form. So we could slightly simplify things by removing CFDefinition. However, this is left to later (this won't be a 3 lines patch). After that, the patch adds a new type of composite indexes to handle indexing clustering keys (which share most code with the existing regular composite index) and update CQL3 to allow adding and querying the new indexes (in particular, it is slighty tricky in SelectStatment to recognize when a clustering key is restricted if 2ndary indexes should be used or not). The last patch adds support for indexing components of the partition key (we don't allow indexing the first component of the partition key as it makes no sense (it's already primary indexed), but if the partition key is composite, secondary indexing the 2+ parts can be useful). Lastly, I'll note that the patches only add theses news indexes for non compact tables. We should generalize to compact tables too, but that would require a bit of generalization that I'd rather add in a second phase.
          Hide
          Sylvain Lebresne added a comment -

          Rebased patches attached.

          Show
          Sylvain Lebresne added a comment - Rebased patches attached.
          Hide
          Sylvain Lebresne added a comment -

          Things move fast on trunk lately, so I've pushed a rebased version at https://github.com/pcmanus/cassandra/commits/5125-2 to avoid rebasing every day.

          Show
          Sylvain Lebresne added a comment - Things move fast on trunk lately, so I've pushed a rebased version at https://github.com/pcmanus/cassandra/commits/5125-2 to avoid rebasing every day.
          Hide
          Carl Yeksigian added a comment -

          A few errors when running the test suite:

          Testcase: testCli(org.apache.cassandra.cli.CliTest): Caused an ERROR
          java.lang.RuntimeException: org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 4
          at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1533)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
          at java.lang.Thread.run(Thread.java:680)
          Caused by: org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 4
          at org.apache.cassandra.db.marshal.LongType.getString(LongType.java:69)
          at org.apache.cassandra.db.index.AbstractSimplePerColumnSecondaryIndex.insert(AbstractSimplePerColumnSecondaryIndex.java:121)
          at org.apache.cassandra.db.index.SecondaryIndexManager$PerColumnIndexUpdater.update(SecondaryIndexManager.java:623)
          at org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:313)
          at org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:168)
          at org.apache.cassandra.db.Memtable.resolve(Memtable.java:253)
          at org.apache.cassandra.db.Memtable.put(Memtable.java:169)
          at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:852)
          at org.apache.cassandra.db.Table.apply(Table.java:379)
          at org.apache.cassandra.db.Table.apply(Table.java:342)
          at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:189)
          at org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:667)
          at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1529)
          ... 3 more

          Testcase: testIndexDeletions(org.apache.cassandra.db.ColumnFamilyStoreTest): Caused an ERROR
          A long is exactly 8 bytes: 4
          org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 4
          at org.apache.cassandra.db.marshal.LongType.getString(LongType.java:69)
          at org.apache.cassandra.db.index.AbstractSimplePerColumnSecondaryIndex.insert(AbstractSimplePerColumnSecondaryIndex.java:121)
          at org.apache.cassandra.db.index.SecondaryIndexManager$PerColumnIndexUpdater.update(SecondaryIndexManager.java:623)
          at org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:313)
          at org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:168)
          at org.apache.cassandra.db.Memtable.resolve(Memtable.java:253)
          at org.apache.cassandra.db.Memtable.put(Memtable.java:169)
          at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:852)
          at org.apache.cassandra.db.Table.apply(Table.java:379)
          at org.apache.cassandra.db.Table.apply(Table.java:342)
          at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:189)
          at org.apache.cassandra.db.ColumnFamilyStoreTest.testIndexDeletions(ColumnFamilyStoreTest.java:301)

          Testcase: testIndexUpdate(org.apache.cassandra.db.ColumnFamilyStoreTest): Caused an ERROR
          Index: 0, Size: 0
          java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
          at java.util.ArrayList.RangeCheck(ArrayList.java:547)
          at java.util.ArrayList.get(ArrayList.java:322)
          at org.apache.cassandra.db.ColumnFamilyStoreTest.testIndexUpdate(ColumnFamilyStoreTest.java:398)

          Show
          Carl Yeksigian added a comment - A few errors when running the test suite: Testcase: testCli(org.apache.cassandra.cli.CliTest): Caused an ERROR java.lang.RuntimeException: org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 4 at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1533) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) Caused by: org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 4 at org.apache.cassandra.db.marshal.LongType.getString(LongType.java:69) at org.apache.cassandra.db.index.AbstractSimplePerColumnSecondaryIndex.insert(AbstractSimplePerColumnSecondaryIndex.java:121) at org.apache.cassandra.db.index.SecondaryIndexManager$PerColumnIndexUpdater.update(SecondaryIndexManager.java:623) at org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:313) at org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:168) at org.apache.cassandra.db.Memtable.resolve(Memtable.java:253) at org.apache.cassandra.db.Memtable.put(Memtable.java:169) at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:852) at org.apache.cassandra.db.Table.apply(Table.java:379) at org.apache.cassandra.db.Table.apply(Table.java:342) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:189) at org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:667) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1529) ... 3 more Testcase: testIndexDeletions(org.apache.cassandra.db.ColumnFamilyStoreTest): Caused an ERROR A long is exactly 8 bytes: 4 org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 4 at org.apache.cassandra.db.marshal.LongType.getString(LongType.java:69) at org.apache.cassandra.db.index.AbstractSimplePerColumnSecondaryIndex.insert(AbstractSimplePerColumnSecondaryIndex.java:121) at org.apache.cassandra.db.index.SecondaryIndexManager$PerColumnIndexUpdater.update(SecondaryIndexManager.java:623) at org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:313) at org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:168) at org.apache.cassandra.db.Memtable.resolve(Memtable.java:253) at org.apache.cassandra.db.Memtable.put(Memtable.java:169) at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:852) at org.apache.cassandra.db.Table.apply(Table.java:379) at org.apache.cassandra.db.Table.apply(Table.java:342) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:189) at org.apache.cassandra.db.ColumnFamilyStoreTest.testIndexDeletions(ColumnFamilyStoreTest.java:301) Testcase: testIndexUpdate(org.apache.cassandra.db.ColumnFamilyStoreTest): Caused an ERROR Index: 0, Size: 0 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.cassandra.db.ColumnFamilyStoreTest.testIndexUpdate(ColumnFamilyStoreTest.java:398)
          Hide
          Sylvain Lebresne added a comment -

          My bad. That was due to a rebase typo that I had fixed in my CASSANDRA-5417 branch but not on that one. I've push the fix to the same github branch than above (https://github.com/pcmanus/cassandra/commits/5125-2).

          Show
          Sylvain Lebresne added a comment - My bad. That was due to a rebase typo that I had fixed in my CASSANDRA-5417 branch but not on that one. I've push the fix to the same github branch than above ( https://github.com/pcmanus/cassandra/commits/5125-2 ).
          Hide
          Aleksey Yeschenko added a comment -

          +1

          Show
          Aleksey Yeschenko added a comment - +1
          Hide
          Sylvain Lebresne added a comment -

          Committed, thanks

          Show
          Sylvain Lebresne added a comment - Committed, thanks
          Hide
          Denis Angilella added a comment -

          Lastly, I'll note that the patches only add theses news indexes for non compact tables. We should generalize to compact tables too, but that would require a bit of generalization that I'd rather add in a second phase.

          With 2.1 and compact tables it is possible to CREATE INDEX on composite primary key columns, but queries returns no results for the tests below.
          Adding this comment for now, can open a new ticket if you prefer.

          CREATE TABLE users2 (
             userID uuid,
             fname text,
             zip int,
             state text,
            PRIMARY KEY ((userID, fname))
          ) WITH COMPACT STORAGE;
          
          CREATE INDEX ON users2 (userID);
          CREATE INDEX ON users2 (fname);
          
          INSERT INTO users2 (userID, fname, zip, state) VALUES (b3e3bc33-b237-4b55-9337-3d41de9a5649, 'John', 10007, 'NY');
          
          // the following queries returns 0 rows, instead of 1 expected
          SELECT * FROM users2 WHERE fname='John'; 
          SELECT * FROM users2 WHERE userid=b3e3bc33-b237-4b55-9337-3d41de9a5649;
          SELECT * FROM users2 WHERE userid=b3e3bc33-b237-4b55-9337-3d41de9a5649 AND fname='John';
          
          // dropping 2ndary indexes restore normal behavior
          
          Show
          Denis Angilella added a comment - Lastly, I'll note that the patches only add theses news indexes for non compact tables. We should generalize to compact tables too, but that would require a bit of generalization that I'd rather add in a second phase. With 2.1 and compact tables it is possible to CREATE INDEX on composite primary key columns, but queries returns no results for the tests below. Adding this comment for now, can open a new ticket if you prefer. CREATE TABLE users2 ( userID uuid, fname text, zip int, state text, PRIMARY KEY ((userID, fname)) ) WITH COMPACT STORAGE; CREATE INDEX ON users2 (userID); CREATE INDEX ON users2 (fname); INSERT INTO users2 (userID, fname, zip, state) VALUES (b3e3bc33-b237-4b55-9337-3d41de9a5649, 'John', 10007, 'NY'); // the following queries returns 0 rows, instead of 1 expected SELECT * FROM users2 WHERE fname='John'; SELECT * FROM users2 WHERE userid=b3e3bc33-b237-4b55-9337-3d41de9a5649; SELECT * FROM users2 WHERE userid=b3e3bc33-b237-4b55-9337-3d41de9a5649 AND fname='John'; // dropping 2ndary indexes restore normal behavior
          Hide
          Sylvain Lebresne added a comment -

          Denis Angilella Correct, the validation during index creation is broken. Do you mind creating a ticket indeed so we track the fix?

          Show
          Sylvain Lebresne added a comment - Denis Angilella Correct, the validation during index creation is broken. Do you mind creating a ticket indeed so we track the fix?
          Hide
          Denis Angilella added a comment -

          Sylvain Lebresne: I created CASSANDRA-8156 to track the fix.

          Show
          Denis Angilella added a comment - Sylvain Lebresne : I created CASSANDRA-8156 to track the fix.

            People

            • Assignee:
              Sylvain Lebresne
              Reporter:
              Jonathan Ellis
              Reviewer:
              Aleksey Yeschenko
            • Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development