Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.1.0
    • Component/s: Core
    • Labels:
      None
    • Environment:

      ubuntu, cluster set up with ccm.

      Description

      A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only one node is taken down, upgraded to trunk, and started again. An rpc timeout exception happens if counter-add operations are done. It usually takes between 1 and 500 add operations before the failure occurs. The failure seems to happen sooner if the coordinator node is NOT the one that was upgraded. Here is the error:

      
      ======================================================================
      ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest
          self.test(*self.arg)
        File "/home/tahooie/cassandra-dtest/counter_upgrade_test.py", line 50, in counter_upgrade_test
          cursor.execute("UPDATE counters SET row = row+1 where key='a'")
        File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in execute
          raise cql.OperationalError("Request did not complete within rpc_timeout.")
      OperationalError: Request did not complete within rpc_timeout.
      
      
      1. CASSANDRA-3804-1.1-v2.patch
        4 kB
        Pavel Yaskevich
      2. node2.log
        20 kB
        Sylvain Lebresne
      3. node1.log
        25 kB
        Sylvain Lebresne
      4. CASSANDRA-3804-1.1.patch
        0.9 kB
        Pavel Yaskevich
      5. CASSANDRA-3804.patch
        1 kB
        Pavel Yaskevich

        Activity

        Tyler Patterson created issue -
        Tyler Patterson made changes -
        Field Original Value New Value
        Description A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only one node is taken down, upgraded to trunk, and started again. An rpc timeout exception happens if counter-add operations are done. It usually takes between 1 and 10 add operations before the failure occurs. The failure seems to happen sooner if the coordinator node is NOT the one that was upgraded. Here is the error:

        {code}

        ======================================================================
        ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
        ----------------------------------------------------------------------
        Traceback (most recent call last):
          File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest
            self.test(*self.arg)
          File "/home/tahooie/cassandra-dtest/counter_upgrade_test.py", line 50, in counter_upgrade_test
            cursor.execute("UPDATE counters SET row = row+1 where key='a'")
          File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in execute
            raise cql.OperationalError("Request did not complete within rpc_timeout.")
        OperationalError: Request did not complete within rpc_timeout.

        {code}

        A script has been added to cassandra-dtest (counter_upgrade_test.py) to demonstrate the failure. The newest version of CCM is required to run the test. It is available here if it hasn't yet been pulled: git@github.com:tpatterson/ccm.git
        A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only one node is taken down, upgraded to trunk, and started again. An rpc timeout exception happens if counter-add operations are done. It usually takes between 1 and 500 add operations before the failure occurs. The failure seems to happen sooner if the coordinator node is NOT the one that was upgraded. Here is the error:

        {code}

        ======================================================================
        ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
        ----------------------------------------------------------------------
        Traceback (most recent call last):
          File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest
            self.test(*self.arg)
          File "/home/tahooie/cassandra-dtest/counter_upgrade_test.py", line 50, in counter_upgrade_test
            cursor.execute("UPDATE counters SET row = row+1 where key='a'")
          File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in execute
            raise cql.OperationalError("Request did not complete within rpc_timeout.")
        OperationalError: Request did not complete within rpc_timeout.

        {code}

        A script has been added to cassandra-dtest (counter_upgrade_test.py) to demonstrate the failure. The newest version of CCM is required to run the test. It is available here if it hasn't yet been pulled: git@github.com:tpatterson/ccm.git
        Jonathan Ellis made changes -
        Assignee Sylvain Lebresne [ slebresne ]
        Fix Version/s 1.1 [ 12317615 ]
        Sylvain Lebresne made changes -
        Summary Counter-add operation fails for cluster upgraded from 1.0 to trunk upgrade problems from 1.0 to trunk
        Sylvain Lebresne made changes -
        Assignee Sylvain Lebresne [ slebresne ] Pavel Yaskevich [ xedin ]
        Pavel Yaskevich made changes -
        Attachment CASSANDRA-3804.patch [ 12514228 ]
        Pavel Yaskevich made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 1.0.7 [ 12319244 ]
        Affects Version/s 1.1.0 [ 12317615 ]
        Reviewer jbellis
        Fix Version/s 1.0.8 [ 12319453 ]
        Fix Version/s 1.1.0 [ 12317615 ]
        Pavel Yaskevich made changes -
        Attachment CASSANDRA-3804-1.1.patch [ 12514605 ]
        Pavel Yaskevich made changes -
        Fix Version/s 1.1.0 [ 12317615 ]
        Fix Version/s 1.0.8 [ 12319453 ]
        Affects Version/s 1.1.0 [ 12317615 ]
        Affects Version/s 1.0.7 [ 12319244 ]
        Sylvain Lebresne made changes -
        Attachment node1.log [ 12514803 ]
        Attachment node2.log [ 12514804 ]
        Sylvain Lebresne made changes -
        Attachment node1.log [ 12514803 ]
        Sylvain Lebresne made changes -
        Attachment node2.log [ 12514804 ]
        Sylvain Lebresne made changes -
        Attachment node1.log [ 12514813 ]
        Attachment node2.log [ 12514814 ]
        Pavel Yaskevich made changes -
        Attachment CASSANDRA-3804-1.1-v2.patch [ 12515189 ]
        Pavel Yaskevich made changes -
        Attachment CASSANDRA-3804-1.1-v2.patch [ 12515189 ]
        Pavel Yaskevich made changes -
        Attachment CASSANDRA-3804-1.1-v2.patch [ 12515739 ]
        Pavel Yaskevich made changes -
        Attachment CASSANDRA-3804-1.1-v2.patch [ 12515739 ]
        Pavel Yaskevich made changes -
        Attachment CASSANDRA-3804-1.1-v2.patch [ 12515740 ]
        Sylvain Lebresne made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Reviewer jbellis slebresne
        Resolution Fixed [ 1 ]
        Tyler Patterson made changes -
        Description A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only one node is taken down, upgraded to trunk, and started again. An rpc timeout exception happens if counter-add operations are done. It usually takes between 1 and 500 add operations before the failure occurs. The failure seems to happen sooner if the coordinator node is NOT the one that was upgraded. Here is the error:

        {code}

        ======================================================================
        ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
        ----------------------------------------------------------------------
        Traceback (most recent call last):
          File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest
            self.test(*self.arg)
          File "/home/tahooie/cassandra-dtest/counter_upgrade_test.py", line 50, in counter_upgrade_test
            cursor.execute("UPDATE counters SET row = row+1 where key='a'")
          File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in execute
            raise cql.OperationalError("Request did not complete within rpc_timeout.")
        OperationalError: Request did not complete within rpc_timeout.

        {code}

        A script has been added to cassandra-dtest (counter_upgrade_test.py) to demonstrate the failure. The newest version of CCM is required to run the test. It is available here if it hasn't yet been pulled: git@github.com:tpatterson/ccm.git
        A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only one node is taken down, upgraded to trunk, and started again. An rpc timeout exception happens if counter-add operations are done. It usually takes between 1 and 500 add operations before the failure occurs. The failure seems to happen sooner if the coordinator node is NOT the one that was upgraded. Here is the error:

        {code}

        ======================================================================
        ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
        ----------------------------------------------------------------------
        Traceback (most recent call last):
          File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest
            self.test(*self.arg)
          File "/home/tahooie/cassandra-dtest/counter_upgrade_test.py", line 50, in counter_upgrade_test
            cursor.execute("UPDATE counters SET row = row+1 where key='a'")
          File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in execute
            raise cql.OperationalError("Request did not complete within rpc_timeout.")
        OperationalError: Request did not complete within rpc_timeout.

        {code}

        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12650957 ] patch-available, re-open possible [ 12749533 ]
        Gavin made changes -
        Workflow patch-available, re-open possible [ 12749533 ] reopen-resolved, no closed status, patch-avail, testing [ 12757085 ]

          People

          • Assignee:
            Pavel Yaskevich
            Reporter:
            Tyler Patterson
            Reviewer:
            Sylvain Lebresne
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development