Details
Description
Creating an index, validator, and default validator then renaming/dropping the index later results in read errors and an invalid unreadable data set.
Updating the CF with the old index will not resolve the problem. You can insert/write all you want, but reads will fail if you come across a row that included one of these cases. The only workaround that I've been able to use is to know exactly what the columns/changes were prior to the CF change and iterate through all the rows inserting the same column name will a NULL value. One problem here is that you _must_ absolutely know what the row keys are called because you can't do a read to get them.
1) create a secondary index on a column with a validator and a default validator
2) insert a row
3) read and verify the row
4) update the CF/index/name/validator
5) read the CF and get an error (CLI or Pycassa)
CLI Commands to create the row and CF/Index
create column family cf_testing with comparator=UTF8Type and default_validation_class=UTF8Type and column_metadata=[
{column_name: colour, validation_class: LongType, index_type: KEYS}];
set cf_testing['key']['colour']='1234';
list cf_testing;
update column family cf_testing with comparator=UTF8Type and default_validation_class=UTF8Type and column_metadata=[
{column_name: color, validation_class: LongType, index_type: KEYS}];
ERROR from the CLI:
list cf_testing;
Using default limit of 100
-------------------
RowKey: key
invalid UTF8 bytes 00000000000004d2
Here is the Pycassa client code that shows this error too.
badindex.py
#!/usr/local/bin/python2.7
import pycassa
import uuid
import sys
def main():
try:
keyspace="badindex"
serverPoolList = ['localhost:9160']
pool = pycassa.connect(keyspace, serverPoolList)
except:
print "couldn't get a connection"
sys.exit()
cfname="cf_testing"
cf = pycassa.ColumnFamily(pool, cfname)
results = cf.get_range(start='key', finish='key', row_count=1)
for key, columns in results:
print key, '=>', columns
if _name_ == "_main_":
main()