Derby
  1. Derby
  2. DERBY-3617

failover on slave hangs after stopmaster on master.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.5.1.1
    • Fix Version/s: None
    • Component/s: Replication
    • Environment:
    • Urgency:
      Normal
    • Issue & fix info:
      Repro attached

      Description

      0. Master and slave in repl. mode.
      1. master: stopmaster
      2. slave failover.
      hangs.....

      According to "Functional Specification for Derby Replication" Rev. 9.0:
      "Failover": "... . The command is only accepted on the slave if the network conection with the master is down." so expecting failover to return valid Connection.
      However, failover never finish.

      See attached master and slave derby.log and testrun log.out.

      1. 3617_NotForCommit.diff.txt
        73 kB
        Ole Solberg
      2. 3617_NotForCommit.stat.txt
        1 kB
        Ole Solberg
      3. db_master-derby.log
        41 kB
        Ole Solberg
      4. db_slave-derby.log
        18 kB
        Ole Solberg
      5. log.out
        44 kB
        Ole Solberg

        Activity

        Ole Solberg created issue -
        Ole Solberg made changes -
        Field Original Value New Value
        Attachment log.out [ 12380051 ]
        Attachment db_slave-derby.log [ 12380053 ]
        Attachment db_master-derby.log [ 12380052 ]
        V.Narayanan made changes -
        Assignee V.Narayanan [ narayanan ]
        Hide
        V.Narayanan added a comment -

        I tried a simple run without authentication enabled and and without executing any transactions
        it seemed to work.

        I will try this again with authentication enabled and after executing some transactions.

        On the master
        ------------------

        vn@vn-laptop:~/work/workspaces/Dery3617/master$ java org.apache.derby.tools.ij
        ij version 10.5
        ij> connect 'jdbc:derby://localhost:1527/replicationdb';
        ij> connect 'jdbc:derby://localhost:1527/replicationdb;startMaster=true;slaveHost=localhost;slavePort=8001';
        ij(CONNECTION1)> connect 'jdbc:derby://localhost:1527/replicationdb;stopMaster=true;slaveHost=localhost;slavePort=8001';
        ij(CONNECTION2)>

        On the slave
        ---------------

        vn@vn-laptop:~/work/workspaces/Dery3617/slave$ java org.apache.derby.tools.ij
        ij version 10.5
        ij> connect 'jdbc:derby://localhost:1528/replicationdb;startSlave=true;slaveHost=localhost;slavePort=8001';
        ERROR XRE08: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE08, SQLERRMC: Replication slave mode started successfully for database 'replicationdb'. Connection refused because the database is in replication slave mode.
        ij> connect 'jdbc:derby://localhost:1528/replicationdb;failover=true';
        ERROR XRE11: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE11, SQLERRMC: failoverreplicationdbXRE11
        ij>

        Show
        V.Narayanan added a comment - I tried a simple run without authentication enabled and and without executing any transactions it seemed to work. I will try this again with authentication enabled and after executing some transactions. On the master ------------------ vn@vn-laptop:~/work/workspaces/Dery3617/master$ java org.apache.derby.tools.ij ij version 10.5 ij> connect 'jdbc:derby://localhost:1527/replicationdb'; ij> connect 'jdbc:derby://localhost:1527/replicationdb;startMaster=true;slaveHost=localhost;slavePort=8001'; ij(CONNECTION1)> connect 'jdbc:derby://localhost:1527/replicationdb;stopMaster=true;slaveHost=localhost;slavePort=8001'; ij(CONNECTION2)> On the slave --------------- vn@vn-laptop:~/work/workspaces/Dery3617/slave$ java org.apache.derby.tools.ij ij version 10.5 ij> connect 'jdbc:derby://localhost:1528/replicationdb;startSlave=true;slaveHost=localhost;slavePort=8001'; ERROR XRE08: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE08, SQLERRMC: Replication slave mode started successfully for database 'replicationdb'. Connection refused because the database is in replication slave mode. ij> connect 'jdbc:derby://localhost:1528/replicationdb;failover=true'; ERROR XRE11: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE11, SQLERRMC: failoverreplicationdbXRE11 ij>
        Hide
        V.Narayanan added a comment -

        Hi Ole, would it be possible for you to pls attach the reproducible using which the
        test was run? Following the steps mentioned does not reproduce this bug.

        Show
        V.Narayanan added a comment - Hi Ole, would it be possible for you to pls attach the reproducible using which the test was run? Following the steps mentioned does not reproduce this bug.
        Hide
        Ole Solberg added a comment -

        I am running the test programmatically which means the timing will be different from when testing manually via ij.

        Here is what I found by instrumenting the test some more:

        A)
        If I add a sleep between the startMaster and the stopMaster the failover will not hang:
        .
        .
        startSlave
        startMaster
        sleep(10000L)
        stopMaster
        failOver
        Gets SQLSTATE: XRE07, SQLERRMC: Could not perform operation because the database is not in replication master mode.
        connect slave: [localhost:4527//export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat]
        CONNECTED

        B)
        Without this sleep the hang occurs:
        .
        .
        startSlave
        startMaster
        stopMaster
        connect slave: SQLSTATE: 08004, SQLERRMC: Connection refused to database '/export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat' because it is in replication slave mode.'
        sleep(10000L) // No effect.
        connect slave: SQLSTATE: 08004, SQLERRMC: Connection refused to database '/export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat' because it is in replication slave mode.'
        failOver
        HANGS!

        In case A): should not the failover ('connect failover=true') return a valid connection according to the spec.?

        Show
        Ole Solberg added a comment - I am running the test programmatically which means the timing will be different from when testing manually via ij. Here is what I found by instrumenting the test some more: A) If I add a sleep between the startMaster and the stopMaster the failover will not hang: . . startSlave startMaster sleep(10000L) stopMaster failOver Gets SQLSTATE: XRE07, SQLERRMC: Could not perform operation because the database is not in replication master mode. connect slave: [localhost:4527//export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat] CONNECTED B) Without this sleep the hang occurs: . . startSlave startMaster stopMaster connect slave: SQLSTATE: 08004, SQLERRMC: Connection refused to database '/export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat' because it is in replication slave mode.' sleep(10000L) // No effect. connect slave: SQLSTATE: 08004, SQLERRMC: Connection refused to database '/export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat' because it is in replication slave mode.' failOver HANGS! In case A): should not the failover ('connect failover=true') return a valid connection according to the spec.?
        Hide
        Ole Solberg added a comment -

        Sorry Narayanan for missing your comment (16/Apr/08 04:03 AM).

        I can upload a preliminary patch for the replication tests soon, which contains this test.

        Show
        Ole Solberg added a comment - Sorry Narayanan for missing your comment (16/Apr/08 04:03 AM). I can upload a preliminary patch for the replication tests soon, which contains this test.
        Hide
        V.Narayanan added a comment -

        Thanks a ton for the clarification Ole. Thank you for the test runs also.

        Show
        V.Narayanan added a comment - Thanks a ton for the clarification Ole. Thank you for the test runs also.
        Hide
        Ole Solberg added a comment -

        Hi Narayanan!

        I am uploading a patch (3617_NotForCommit) for the replication tests which includes tests for the following 4 cases:

        A) (as above)
        .
        .
        startSlave
        startMaster

        1. No bigInsert
          sleep(10000L)
          stopMaster
          failOver
          Gets SQLSTATE: XRE07, SQLERRMC: Could not perform operation because the database is not in replication master mode.
          connect slave: [localhost:4527//export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat]
          CONNECTED

        B) (as above)
        .
        .
        startSlave
        startMaster

        1. No bigInsert
        2. No sleep(10000L)
          stopMaster
          connect slave: SQLSTATE: 08004, SQLERRMC: Connection refused to database '/export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat' because it is in replication slave mode.'
          failOver
          HANGS!

        C) (B + "insert...")
        .
        .
        startSlave
        startMaster
        bigInsert

        1. No: sleep(10000L)
          stopMaster
          connect slave: SQLSTATE: 08004, SQLERRMC: Connection refused to database '/export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat' because it is in replication slave mode.'
          failOver
          OK! Failover returns a Connection.
          connect slave: [localhost:4527//export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat]
          CONNECTED

        D) (A + "insert...")
        .
        .
        startSlave
        startMaster
        bigInsert
        sleep(10000L)
        stopMaster
        failOver
        Gets SQLSTATE: XRE07, SQLERRMC: Could not perform operation because the database is not in replication master mode.
        connect slave: [localhost:4527//export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat]
        CONNECTED

        These test cases are located in ReplicationRun_Local_3_p2 as
        Id) insert,immediate - method() [failover response, connect response] ...
        A) false, false - testReplication_Local_3_p2_StateTests_smallInsert_sleepBeforeStopMaster() [XRE07, CONNECTED]
        B) false, true - testReplication_Local_3_p2_StateTests_smallInsert_immediateStopMaster() [HANGS!, -] (DISABLED)
        C) true, true - testReplication_Local_3_p2_StateTests_bigInsert_immediateStopMaster() [CONNECTED, CONNECTED]
        D) true, false - testReplication_Local_3_p2_StateTests_bigInsert_sleepBeforeStopMaster() [XRE07, CONNECTED]

        Show
        Ole Solberg added a comment - Hi Narayanan! I am uploading a patch (3617_NotForCommit) for the replication tests which includes tests for the following 4 cases: A) (as above) . . startSlave startMaster No bigInsert sleep(10000L) stopMaster failOver Gets SQLSTATE: XRE07, SQLERRMC: Could not perform operation because the database is not in replication master mode. connect slave: [localhost:4527//export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat] CONNECTED B) (as above) . . startSlave startMaster No bigInsert No sleep(10000L) stopMaster connect slave: SQLSTATE: 08004, SQLERRMC: Connection refused to database '/export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat' because it is in replication slave mode.' failOver HANGS! C) (B + "insert...") . . startSlave startMaster bigInsert No: sleep(10000L) stopMaster connect slave: SQLSTATE: 08004, SQLERRMC: Connection refused to database '/export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat' because it is in replication slave mode.' failOver OK! Failover returns a Connection. connect slave: [localhost:4527//export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat] CONNECTED D) (A + "insert...") . . startSlave startMaster bigInsert sleep(10000L) stopMaster failOver Gets SQLSTATE: XRE07, SQLERRMC: Could not perform operation because the database is not in replication master mode. connect slave: [localhost:4527//export/home/tmp/os136789/testingInMyDerbySandbox/db_slave/wombat] CONNECTED These test cases are located in ReplicationRun_Local_3_p2 as Id) insert,immediate - method() [failover response, connect response] ... A) false, false - testReplication_Local_3_p2_StateTests_smallInsert_sleepBeforeStopMaster() [XRE07, CONNECTED] B) false, true - testReplication_Local_3_p2_StateTests_smallInsert_immediateStopMaster() [HANGS!, -] (DISABLED) C) true, true - testReplication_Local_3_p2_StateTests_bigInsert_immediateStopMaster() [CONNECTED, CONNECTED] D) true, false - testReplication_Local_3_p2_StateTests_bigInsert_sleepBeforeStopMaster() [XRE07, CONNECTED]
        Ole Solberg made changes -
        Attachment 3617_NotForCommit.stat.txt [ 12380377 ]
        Attachment 3617_NotForCommit.diff.txt [ 12380376 ]
        Ole Solberg made changes -
        Attachment 3617_NotForCommit.diff.txt [ 12380376 ]
        Ole Solberg made changes -
        Attachment 3617_NotForCommit.stat.txt [ 12380377 ]
        Hide
        Ole Solberg added a comment -

        Minor cleanup to the 3617_NotForCommit. patch...

        Show
        Ole Solberg added a comment - Minor cleanup to the 3617_NotForCommit. patch...
        Ole Solberg made changes -
        Attachment 3617_NotForCommit.stat.txt [ 12380379 ]
        Attachment 3617_NotForCommit.diff.txt [ 12380378 ]
        Myrna van Lunteren made changes -
        Affects Version/s 10.5.1.1 [ 12313771 ]
        Affects Version/s 10.5.0.0 [ 12313010 ]
        Hide
        Knut Anders Hatlen added a comment -

        Triaged for 10.5.2. Marking as Unassigned due to lack of activity.

        Show
        Knut Anders Hatlen added a comment - Triaged for 10.5.2. Marking as Unassigned due to lack of activity.
        Knut Anders Hatlen made changes -
        Urgency Normal
        Issue & fix info [Repro attached]
        Knut Anders Hatlen made changes -
        Assignee V.Narayanan [ narayanan ]
        Kathey Marsden made changes -
        Labels derby_triage10_5_2
        Gavin made changes -
        Workflow jira [ 12428903 ] Default workflow, editable Closed status [ 12799198 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Ole Solberg
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development