Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9195

PITR commitlog replay only actually replays mutation every other time

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Fix Version/s: 2.1.6
    • Component/s: None
    • Labels:
      None
    • Severity:
      Normal

      Description

      Version: Cassandra 2.1.4.374 | DSE 4.7.0

      The main issue here is that the restore-cycle only replays the mutations
      every other try. On the first try, it will restore the snapshot as expected
      and the cassandra system load will show that it's reading the mutations, but
      they do not actually get replayed, and at the end you're left with only the
      snapshot data (2k records).

      If you re-run the restore-cycle again, the commitlogs are replayed as expected,
      and the data expected is present in the table (4k records, with a spot check of
      record 4500, as it's in the commitlog but not the snapshot).

      Then if you run the cycle again, it will fail. Then again, and it will work. The work/
      not work pattern continues. Even re-running the commitlog replay a 2nd time, without
      reloading the snapshot doesn't work

      The load process is:

      • Modify commitlog segment to 1mb
      • Archive to directory
      • create keyspace/table
      • insert base data
      • initial snapshot
      • write more data
      • capture timestamp
      • write more data
      • final snapshot
      • copy commitlogs to 2nd location
      • modify cassandra-env to replay only specified keyspace
      • modify commitlog properties to restore from 2nd location, with noted timestamp

      The restore cycle is:

      • truncate table
      • sstableload snapshot
      • flush
      • output data status
      • restart to replay commitlogs
      • output data status

      ====
      See attached .py for a mostly automated reproduction scenario. It expects DSE (and I found it with DSE 4.7.0-1), rather than "actual" Cassandra, but it's not using any DSE specific features. The script looks for the configs in the DSE locations, but they're set at the top, and there's only 2 places where dse is restarted.

        Attachments

        1. loader.py
          8 kB
          Jon Moses
        2. 9195-v2.1.patch
          9 kB
          Branimir Lambov
        3. 9195-newtest-trunk.txt
          12 kB
          Branimir Lambov
        4. 9195-newtest-2.1.txt
          8 kB
          Branimir Lambov
        5. 9195-2.1-v2.patch
          3 kB
          Branimir Lambov

          Activity

            People

            • Assignee:
              blambov Branimir Lambov
              Reporter:
              jmoses Jon Moses
              Authors:
              Branimir Lambov
              Reviewers:
              Jeremiah Jordan
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: