Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14927

During data migration from 7 node to 21 node cluster using sstableloader, new data is being populated on the new tables & data is being duplicated on user type tables

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Urgent
    • Resolution: Not A Problem
    • 2.1.13
    • None

    Description

      I'm trying to migrate data from 7 node (single DC) cluster to a 21 node (3 DC) cluster using sstableloader.

      We have same versions on both old and new clusters.

      cqlsh 5.0.1 

      Cassandra 2.1.13 

      CQL spec 3.2.1 

      Old and New clusters are in different networks. So we opened the following ports between them.

      7000- storage port
      7001- ssl storage port
      7199- JMX port
      9042- client port
      9160- Thrift client port

      We use vnodes in the clusters.

      We made sure cassandra.yaml file on the new cluster is set correct by changing following options,

       

      {{cluster_name: 'MyCassandraCluster' }}

      {{num_tokens: 256 }}

      {{seed_provider: - }}

      {{class_name: org.apache.cassandra.locator.SimpleSeedProvider }}

      {{parameters: - }}

      {{seeds: "10.168.66.41,10.176.170.59" }}

      listen_address: localhost

      endpoint_snitch: GossipingPropertyFileSnitch

      And also changes in cassaandra-rackdc-properties for each DC by specifying respective DC and rack.

      while creating keyspaces, changed Replication to NetworkTopologyStratagy.

       

      cluster looks healthy, all the node are UP and NORMAL. 

       

      I was able to get the data from old cluster to new cluster. But, along with the data from old cluster, I see some new rows being populated in the tables on new cluster and data is being duplicated in the tables with user type

      We have used the following steps to migrate data:

      1. Took snapshorts for all the keyspaces that we want to migrate. (9 keyspaces). Used the nodetool snapshot command on source nodes to take snapshot of required keyspace/table by specifying hostname, jmx port and keyspace
         __ 

      /a/cassandra/bin/nodetool -u $(sudo su - company -c "cat /a/cassandra/jmxremote.password" | awk '{print $1}') -pw $(sudo su - company -c "cat /a/cassandra/jmxremote.password" | awk '{print $2}')  -h localhost -p 7199 snapshot keyspace_name

      1. After taking snapshots, move these snapshot directory from source nodes to target node.
               
        → Create a tar file on source node for the snapshot directory that we want to move on to target node.
             tar -cvf file.tar snapshot_name
        → Move this file.tar from source node to local machine.
             scp -S gwsh root@192.168.64.99:/a/cassandra/data/file.tar .
        → Now move this file.tar from local machine to a new directory(example: test) in the target node.
            scp -S gwsh file.tar root@192.168.58.41:/a/cassandra/data/test/.
      2. Now untar this file.tar in test directory in target node.
      3. The path of the sstables must be same in both source and target.
      4. To bulk load these files using sstableloader,run sstableloader on source node, indicate one or more nodes in the destination Cluster with -d flag, which can accept comma-separated list of IP addresses or hostnames, and specify the path to  sstables in the source node. __ 

      /a/Cassandra/bin/ ./sstableloader -d host_IP path_to_sstables

                Example:

      /a/cassandra/bin# sstableloader -d 192.168.58.41 -u popps -pw ******* -tf org.apache.cassandra.thrift.SSLTransportFactory -ts /a/cassandra/ssl/truststore.jks -tspw test123 -ks /a/cassandra/ssl/keystore.jks -kspw test123 -f /a/cassandra/conf/cassandra.yaml /a/cassandra/data/app_properties/admins-58524140431511e8bbb6357f562e11ca

      Summary statistics:
      Connections per host: : 1
      Total files transferred: : 9
      Total bytes transferred: : 1787893
      Total duration (ms): : 2936
      Average transfer rate (MB/s): : 0
      Peak transfer rate (MB/s): : 0

       

      Performed these steps on all the tables. And checked the row count in old and new tables using CQLSH

      cqlsh> SELECT count FROM keyspace.table;

      example for a single table:

      count on new table: 341

      count on old table: 303

       

      And we are also able to identify the difference in tables by using 'sdiff' command. Followed the following steps:

      • created .txt/.csv files for tables in old and new clusters.
      • compared them using sdiff command   

       

      So I request someone can help me to know the cause behind the population of new data in the new tables.  

      Please let me know if you need more info.

      PS: After migrating the data for the first time and saw these issues, we have TRUNCATED all the tables and DROPPED tables with user 'type' and recreated  the dropped tables. And did the same procedure for migrating data again. Still we see the same issues. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            K-Test KALYAN CHAKRAVARTHY KANCHARLA
            KALYAN CHAKRAVARTHY KANCHARLA KALYAN CHAKRAVARTHY KANCHARLA
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: