Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-12857

Upgrade procedure between 2.1.x and 3.0.x is broken

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Urgent
    • Resolution: Won't Fix
    • None
    • None
    • None
    • Critical

    Description

      It is not possible safely to do Cassandra in place upgrade from 2.1.14 to 3.0.9.

      Distribution: deb packages from datastax community repo.

      The upgrade was performed according to procedure from this docu: https://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgrdCassandraDetails.html

      Potential reason: The upgrade procedure creates corrupted system_schema and this keyspace get populated in the cluster and kills it.

      We started with one datacenter which contains 19 nodes divided to two racks.
      First rack was successfully upgraded and nodetool describecluster reported two schema versions. One for upgraded nodes, another for non-upgraded nodes.

      On starting new version on a first node from the second rack:

      INFO  [main] 2016-10-25 13:06:12,103 LegacySchemaMigrator.java:87 - Moving 11 keyspaces from legacy schema tables to the new schema keyspace (system_schema)
      INFO  [main] 2016-10-25 13:06:12,104 LegacySchemaMigrator.java:148 - Migrating keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@7505e6ac
      INFO  [main] 2016-10-25 13:06:12,200 LegacySchemaMigrator.java:148 - Migrating keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@64414574
      INFO  [main] 2016-10-25 13:06:12,204 LegacySchemaMigrator.java:148 - Migrating keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@3f2c5f45
      INFO  [main] 2016-10-25 13:06:12,207 LegacySchemaMigrator.java:148 - Migrating keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@2bc2d64d
      INFO  [main] 2016-10-25 13:06:12,301 LegacySchemaMigrator.java:148 - Migrating keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@77343846
      INFO  [main] 2016-10-25 13:06:12,305 LegacySchemaMigrator.java:148 - Migrating keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@19b0b931
      INFO  [main] 2016-10-25 13:06:12,308 LegacySchemaMigrator.java:148 - Migrating keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@44bb0b35
      INFO  [main] 2016-10-25 13:06:12,311 LegacySchemaMigrator.java:148 - Migrating keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@79f6cd51
      INFO  [main] 2016-10-25 13:06:12,319 LegacySchemaMigrator.java:148 - Migrating keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@2fcd363b
      INFO  [main] 2016-10-25 13:06:12,356 LegacySchemaMigrator.java:148 - Migrating keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@609eead6
      INFO  [main] 2016-10-25 13:06:12,358 LegacySchemaMigrator.java:148 - Migrating keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@7eb7f5d0
      INFO  [main] 2016-10-25 13:06:13,958 LegacySchemaMigrator.java:97 - Truncating legacy schema tables
      INFO  [main] 2016-10-25 13:06:26,474 LegacySchemaMigrator.java:103 - Completed migration of legacy schema tables
      INFO  [main] 2016-10-25 13:06:26,474 StorageService.java:521 - Populating token metadata from system tables
      INFO  [main] 2016-10-25 13:06:26,796 StorageService.java:528 - Token metadata: Normal Tokens: [HUGE LIST of tokens]
      INFO  [main] 2016-10-25 13:06:29,066 ColumnFamilyStore.java:389 - Initializing ...
      INFO  [main] 2016-10-25 13:06:29,066 ColumnFamilyStore.java:389 - Initializing ...
      INFO  [main] 2016-10-25 13:06:45,894 AutoSavingCache.java:165 - Completed loading (2 ms; 460 keys) KeyCache cache
      INFO  [main] 2016-10-25 13:06:46,982 StorageService.java:521 - Populating token metadata from system tables
      INFO  [main] 2016-10-25 13:06:47,394 StorageService.java:528 - Token metadata: Normal Tokens:[HUGE LIST of tokens]
      INFO  [main] 2016-10-25 13:06:47,420 LegacyHintsMigrator.java:88 - Migrating legacy hints to new storage
      INFO  [main] 2016-10-25 13:06:47,420 LegacyHintsMigrator.java:91 - Forcing a major compaction of system.hints table
      INFO  [main] 2016-10-25 13:06:50,587 LegacyHintsMigrator.java:95 - Writing legacy hints to the new storage
      INFO  [main] 2016-10-25 13:06:53,927 LegacyHintsMigrator.java:99 - Truncating system.hints table
      ....
      INFO  [main] 2016-10-25 13:06:56,572 MigrationManager.java:342 - Create new table: org.apache.cassandra.config.CFMetaData@242e5306[cfId=c5e99f16-8677-3914-b17e-960613512345,ksName=system_traces,cfName=sessions,flags=[COMPOUND],params=TableParams{comment=tracing sessions, read_repair_chance=0.0, dclocal_read_repair_chance=0.0, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=0, default_time_to_live=0, memtable_flush_period_in_ms=3600000, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compression=org.apache.cassandra.schema.CompressionParams@3fa913a4, extensions={}},comparator=comparator(),partitionColumns=[[] | [client command coordinator duration request started_at parameters]],partitionKeyColumns=[ColumnDefinition{name=session_id, type=org.apache.cassandra.db.marshal.UUIDType, kind=PARTITION_KEY, position=0}],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.UUIDType,columnMetadata=[ColumnDefinition{name=client, type=org.apache.cassandra.db.marshal.InetAddressType, kind=REGULAR, position=-1}, ColumnDefinition{name=command, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1}, ColumnDefinition{name=session_id, type=org.apache.cassandra.db.marshal.UUIDType, kind=PARTITION_KEY, position=0}, ColumnDefinition{name=coordinator, type=org.apache.cassandra.db.marshal.InetAddressType, kind=REGULAR, position=-1}, ColumnDefinition{name=request, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1}, ColumnDefinition{name=started_at, type=org.apache.cassandra.db.marshal.TimestampType, kind=REGULAR, position=-1}, ColumnDefinition{name=duration, type=org.apache.cassandra.db.marshal.Int32Type, kind=REGULAR, position=-1}, ColumnDefinition{name=parameters, type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type), kind=REGULAR, position=-1}],droppedColumns={},triggers=[],indexes=[]]
      INFO  [GossipStage:1] 2016-10-25 13:06:57,121 StorageService.java:1969 - Node /10.41.100.31 state jump to NORMAL
      INFO  [GossipStage:1] 2016-10-25 13:06:57,127 TokenMetadata.java:479 - Updating topology for /10.41.100.31
      INFO  [GossipStage:1] 2016-10-25 13:06:57,127 TokenMetadata.java:479 - Updating topology for /10.41.100.31
      INFO  [HANDSHAKE-/10.11.100.19] 2016-10-25 13:06:57,128 OutboundTcpConnection.java:515 - Handshaking version with /10.11.100.19
      .....
      INFO  [main] 2016-10-25 13:07:02,773 MigrationManager.java:342 - Create new table: ……………
      INFO  [main] 2016-10-25 13:07:04,136 MigrationManager.java:302 - Create new Keyspace: KeyspaceMetadata
      

      But then all upgraded nodes reported many times the same error

      ERROR [InternalResponseStage:12] 2016-10-25 13:07:26,891 MigrationTask.java:96 - Configuration exception merging remote schema 
      org.apache.cassandra.exceptions.ConfigurationException: Column family comparators do not match or are not compatible (found comparator(org.apache.cassandra.db.marshal.UTF8Type, org.apac......
              at org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:787) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:740) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.config.Schema.updateTable(Schema.java:661) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1346) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1302) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1252) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:92) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53) [apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) [apache-cassandra-3.0.9.jar:3.0.9]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_101]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_101]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_101]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_101]
              at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
      

      nodetool describecluster reported 4 different schema versions
      1. All nodes on old version
      2. 7 migrated nodes from the first rack
      3. 2 migrated nodes from the first rack
      4. 1 node from the second rack

      Meanwhile the cluster was fully responsible for reads and writes.
      Anyway the migration was stopped at this point and further investigations showed that there are corrupted records in system_schema.tables, system_schema.columns contained duplicated broken records with \x00 instead of letters.

      dc1_tenant_ssd |                         \x00\x00\x00\x00\x00\x00 |                  0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} |                UP (on SSD) | {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |                          0 |                    0 |           {} | {'compound'} |           864000 | 0ae08450-80b9-11e6-8bf1-0df6cc57511a |               2048 |                           0 |                128 |                  0 |      99PERCENTILE
      dc1_tenant_ssd | \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 |                  0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} | UP (old CF format, on SSD) | {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |                          0 |                    0 |           {} |    {'dense'} |           864000 | 16c420b0-78fd-11e6-ae98-ff8f609f3a2d |               2048 |                           0 |                128 |                  0 |      99PERCENTILE
      dc1_tenant_ssd |                         \x00\x00\x00\x00\x00\x00 |                  0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} |                UT (on SSD) | {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |                          0 |                    0 |           {} |    {'dense'} |           864000 | c38bce70-78fc-11e6-ae98-ff8f609f3a2d |               2048 |                           0 |                128 |                  0 |      99PERCENTILE
      dc1_tenant_ssd |                                           user_p |                  0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} |                UP (on SSD) | {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |                          0 |                    0 |           {} | {'compound'} |           864000 | 0ae08450-80b9-11e6-8bf1-0df6cc57511a |               2048 |                           0 |                128 |                  0 |      99PERCENTILE
      dc1_tenant_ssd |                                     user_p_oldcf |                  0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} | UP (old CF format, on SSD) | {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |                          0 |                    0 |           {} |    {'dense'} |           864000 | 16c420b0-78fd-11e6-ae98-ff8f609f3a2d |               2048 |                           0 |                128 |                  0 |      99PERCENTILE
      dc1_tenant_ssd |                                           user_t |                  0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} |                UT (on SSD) | {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |                          0 |                    0 |           {} |    {'dense'} |           864000 | c38bce70-78fc-11e6-ae98-ff8f609f3a2d |               2048 |                           0 |                128 |                  0 |      99PERCENTILE
      

      it is clear that system_schema was corrupted on every node based on sstabledump output.
      The strange thing is that before upgrading the whole cluster, one single node was upgraded one day before and system_schema was OK before to roll out the upgrade on other nodes. It was particularly checked.

      later the upgraded nodes refused to restart due to the duplicates in system_schema.tables with an exception:

      java.lang.IllegalStateException: One row required, 2 found
              at org.apache.cassandra.cql3.UntypedResultSet$FromResultSet.one(UntypedResultSet.java:84) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:938) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:928) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:891) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:868) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:856) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:136) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:126) ~[apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:239) [apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:568) [apache-cassandra-3.0.9.jar:3.0.9]
              at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:696) [apache-cassandra-3.0.9.jar:3.0.9]
      

      I am quite confident that this is not a hardware problem, so far tried to perform the upgrade twice with the same results.
      Yes, this very unfortunate migration scenario didn't affect only one node and brought the cluster to unusable state where there were no ways back. So far decommission didn't work between different versions and scrub removed all data from tables in system_schema.
      We ended up by exporting the data and removing upgraded nodes from the cluster with its data.

      Attachments

        1. cassandra.schema
          347 kB
          Alexander Yasnogor

        Activity

          People

            Unassigned Unassigned
            ppinstal Alexander Yasnogor
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: