Derby
  1. Derby
  2. DERBY-3447

Shutdown on a database without stopping replication hangs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.4.1.3
    • Fix Version/s: 10.4.1.3, 10.5.1.1
    • Component/s: Replication
    • Labels:
      None
    1. Derby3447_v1.diff
      2 kB
      V.Narayanan
    2. Derby3447_v1.stat
      0.2 kB
      V.Narayanan
    3. Derby3447_v2.diff
      2 kB
      V.Narayanan
    4. Derby3447_v2.stat
      0.2 kB
      V.Narayanan
    5. Derby3447_v3.diff
      3 kB
      V.Narayanan
    6. Derby3447_v3.stat
      0.3 kB
      V.Narayanan
    7. Derby3447_v4.diff
      5 kB
      V.Narayanan
    8. Derby3447_v4.stat
      0.4 kB
      V.Narayanan
    9. Derby3447_v5.diff
      5 kB
      V.Narayanan
    10. Derby3447_v5.stat
      0.4 kB
      V.Narayanan

      Issue Links

        Activity

        Hide
        Jørgen Løland added a comment -

        I tried to start replication and then shutdown the master database, but did not get a hang. This may have been fixed in a recent commit. If you still get a hang, could you please submit a reproducible script?

        Show
        Jørgen Løland added a comment - I tried to start replication and then shutdown the master database, but did not get a hang. This may have been fixed in a recent commit. If you still get a hang, could you please submit a reproducible script?
        Hide
        Jørgen Løland added a comment -

        When replication has been set up in embedded using ij,

        connect 'jdbc:derby:;shutdown=true'
        quit;

        works fine on the master, whereas only

        quit;

        hangs. I'm not sure what the difference here is, but the thread stack reports that the log shipper thread is still alive. I think we shoul add cleanup code in MC#stop to get rid of this thread in the case that the module is unbooted without calling stopMaster or failover first. That happens e.g. for system shutdown.

        Show
        Jørgen Løland added a comment - When replication has been set up in embedded using ij, connect 'jdbc:derby:;shutdown=true' quit; works fine on the master, whereas only quit; hangs. I'm not sure what the difference here is, but the thread stack reports that the log shipper thread is still alive. I think we shoul add cleanup code in MC#stop to get rid of this thread in the case that the module is unbooted without calling stopMaster or failover first. That happens e.g. for system shutdown.
        Hide
        Øystein Grøvlen added a comment -

        The same operation also hangs on the slave. Neither the log receiver thread nor the boot thread seems to have been terminated.

        Show
        Øystein Grøvlen added a comment - The same operation also hangs on the slave. Neither the log receiver thread nor the boot thread seems to have been terminated.
        Hide
        V.Narayanan added a comment -

        Modified the stop() method to call stopMaster in the attached patch.

        Ran the following cases as repros, seems to run fine.

        Repro 1
        -------

        exit ij with quit;

        ij version 10.4
        ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;create=true';
        ij> call syscs_util.syscs_freeze_database();
        0 rows inserted/updated/deleted
        ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort-8001';
        ERROR XJ028: The URL 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort-8001' is not properly formed.
        ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort=8001';
        ij(CONNECTION1)> quit;
        vn@vn-laptop:~/work/workspaces/Derby3447/master$

        Repro 2
        -------

        exit ij, with system shutdown

        connect 'jdbc:derby:;shutdown=true'
        and
        quit

        vn@vn-laptop:~/work/workspaces/Derby3447/master$ java org.apache.derby.tools.ij
        ij version 10.4
        ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;create=true';
        ij> call syscs_util.syscs_freeze_database();
        0 rows inserted/updated/deleted
        ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort=8001';
        ij(CONNECTION1)> connect 'jdbc:derby:;shutdown=true';
        ERROR XJ015: Derby system shutdown.
        ij(CONNECTION1)> quit;
        vn@vn-laptop:~/work/workspaces/Derby3447/master$

        Repro 3
        --------

        exit ij, with database shutdown.

        vn@vn-laptop:~/work/workspaces/Derby3447/master$ java org.apache.derby.tools.ij
        ij version 10.4
        ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;create=true';
        ij> call syscs_util.syscs_freeze_database();
        0 rows inserted/updated/deleted
        ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort=8001';
        ij(CONNECTION1)> connect 'jdbc:derby:masterDB;shutdown=true';
        ERROR 08006: Database 'masterDB' shutdown.
        ij(CONNECTION1)> exit;
        vn@vn-laptop:~/work/workspaces/Derby3447/master$

        Show
        V.Narayanan added a comment - Modified the stop() method to call stopMaster in the attached patch. Ran the following cases as repros, seems to run fine. Repro 1 ------- exit ij with quit; ij version 10.4 ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;create=true'; ij> call syscs_util.syscs_freeze_database(); 0 rows inserted/updated/deleted ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort-8001'; ERROR XJ028: The URL 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort-8001' is not properly formed. ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort=8001'; ij(CONNECTION1)> quit; vn@vn-laptop:~/work/workspaces/Derby3447/master$ Repro 2 ------- exit ij, with system shutdown connect 'jdbc:derby:;shutdown=true' and quit vn@vn-laptop:~/work/workspaces/Derby3447/master$ java org.apache.derby.tools.ij ij version 10.4 ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;create=true'; ij> call syscs_util.syscs_freeze_database(); 0 rows inserted/updated/deleted ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort=8001'; ij(CONNECTION1)> connect 'jdbc:derby:;shutdown=true'; ERROR XJ015: Derby system shutdown. ij(CONNECTION1)> quit; vn@vn-laptop:~/work/workspaces/Derby3447/master$ Repro 3 -------- exit ij, with database shutdown. vn@vn-laptop:~/work/workspaces/Derby3447/master$ java org.apache.derby.tools.ij ij version 10.4 ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;create=true'; ij> call syscs_util.syscs_freeze_database(); 0 rows inserted/updated/deleted ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort=8001'; ij(CONNECTION1)> connect 'jdbc:derby:masterDB;shutdown=true'; ERROR 08006: Database 'masterDB' shutdown. ij(CONNECTION1)> exit; vn@vn-laptop:~/work/workspaces/Derby3447/master$
        Hide
        Øystein Grøvlen added a comment -

        Hi, I tried the patch, but I still get I hang when I quit an ij that is running a master.
        (Since MasterController has moved, I modified the path in the diff file before applying the patch.)
        (I am runnig with classes.)

        Show
        Øystein Grøvlen added a comment - Hi, I tried the patch, but I still get I hang when I quit an ij that is running a master. (Since MasterController has moved, I modified the path in the diff file before applying the patch.) (I am runnig with classes.)
        Hide
        V.Narayanan added a comment -

        >Hi, I tried the patch, but I still get I hang when I quit an ij that is running a master.

        You just did a quit; and it hanged? Surprising the repro worked for me. I will check this.

        >(Since MasterController has moved, I modified the path in the diff file before applying the patch.)

        Just modified the file names in the diff?

        >(I am runnig with classes.)

        Would this be different from running with the jars?

        Show
        V.Narayanan added a comment - >Hi, I tried the patch, but I still get I hang when I quit an ij that is running a master. You just did a quit; and it hanged? Surprising the repro worked for me. I will check this. >(Since MasterController has moved, I modified the path in the diff file before applying the patch.) Just modified the file names in the diff? >(I am runnig with classes.) Would this be different from running with the jars?
        Hide
        V.Narayanan added a comment -

        I am not able to reproduce this problem on both embedded and network/client

        Network Client

        vn@vn-laptop:~/work/workspaces/Derby3447_1/master$ java org.apache.derby.tools.ij
        ij version 10.5
        ij> connect 'jdbc:derby://localhost:1527/replicationdb';
        ij> connect 'jdbc:derby://localhost:1527/replicationdb;startMaster=true;slaveHost=localhost;slavePort=8001';
        ij(CONNECTION1)> quit;
        vn@vn-laptop:~/work/workspaces/Derby3447_1/master$

        Embedded

        vn@vn-laptop:~/work/workspaces/Derby3447_1/master$ java org.apache.derby.tools.ij
        ij version 10.5
        ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;create=true';
        ij> call SYSCS_UTIL.SYSCS_FREEZE_DATABASE();
        0 rows inserted/updated/deleted
        ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort=8001';
        ij(CONNECTION1)> quit;
        vn@vn-laptop:~/work/workspaces/Derby3447_1/master$

        Can you pls re-run and confirm on the current workspace to see if this problem is occurring?

        Show
        V.Narayanan added a comment - I am not able to reproduce this problem on both embedded and network/client Network Client vn@vn-laptop:~/work/workspaces/Derby3447_1/master$ java org.apache.derby.tools.ij ij version 10.5 ij> connect 'jdbc:derby://localhost:1527/replicationdb'; ij> connect 'jdbc:derby://localhost:1527/replicationdb;startMaster=true;slaveHost=localhost;slavePort=8001'; ij(CONNECTION1)> quit; vn@vn-laptop:~/work/workspaces/Derby3447_1/master$ Embedded vn@vn-laptop:~/work/workspaces/Derby3447_1/master$ java org.apache.derby.tools.ij ij version 10.5 ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;create=true'; ij> call SYSCS_UTIL.SYSCS_FREEZE_DATABASE(); 0 rows inserted/updated/deleted ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort=8001'; ij(CONNECTION1)> quit; vn@vn-laptop:~/work/workspaces/Derby3447_1/master$ Can you pls re-run and confirm on the current workspace to see if this problem is occurring?
        Hide
        Øystein Grøvlen added a comment -

        It works for me if I turn off authentication and SQL authorization.

        Show
        Øystein Grøvlen added a comment - It works for me if I turn off authentication and SQL authorization.
        Hide
        V.Narayanan added a comment -

        I am not able to reproduce this problem on the network client (surprises me, Maybe I did something wrong!!).
        But am able to reproduce this in the embedded mode. I did a jstack to find out what is hanging and found the
        following relevant

        -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        master

        "Attach Listener" daemon prio=10 tid=0x08312000 nid=0x2086 waiting on condition [0x00000000..0x00000000]
        java.lang.Thread.State: RUNNABLE

        "DestroyJavaVM" prio=10 tid=0x08059800 nid=0x2004 waiting on condition [0x00000000..0xb7dfc090]
        java.lang.Thread.State: RUNNABLE

        "derby.master.logger-mydb" prio=10 tid=0x08311800 nid=0x203b in Object.wait() [0xb5594000..0xb55950c0]
        java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)

        • waiting on <0x8c0b6180> (a org.apache.derby.impl.store.replication.master.AsynchronousLogShipper)
          at org.apache.derby.impl.store.replication.master.AsynchronousLogShipper.run(AsynchronousLogShipper.java:203)
        • locked <0x8c0b6180> (a org.apache.derby.impl.store.replication.master.AsynchronousLogShipper)

        "derby.rawStoreDaemon" daemon prio=10 tid=0x082b5c00 nid=0x2015 in Object.wait() [0xb5607000..0xb5608040]
        java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)

        • waiting on <0x8c61f0a0> (a org.apache.derby.impl.services.daemon.BasicDaemon)
          at org.apache.derby.impl.services.daemon.BasicDaemon.rest(BasicDaemon.java:571)
        • locked <0x8c61f0a0> (a org.apache.derby.impl.services.daemon.BasicDaemon)
          at org.apache.derby.impl.services.daemon.BasicDaemon.run(BasicDaemon.java:388)
          at java.lang.Thread.run(Thread.java:619)

        "Timer-0" daemon prio=10 tid=0x08294000 nid=0x2014 in Object.wait() [0xb5666000..0xb5666fc0]
        java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)

        • waiting on <0x8c5d47b0> (a java.util.TaskQueue)
          at java.lang.Object.wait(Object.java:485)
          at java.util.TimerThread.mainLoop(Timer.java:483)
        • locked <0x8c5d47b0> (a java.util.TaskQueue)
          at java.util.TimerThread.run(Timer.java:462)

        "derby.antiGC" daemon prio=10 tid=0x08213c00 nid=0x2013 in Object.wait() [0xb56ba000..0xb56baf40]
        java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)

        • waiting on <0x8c5870a8> (a org.apache.derby.impl.services.monitor.AntiGC)
          at java.lang.Object.wait(Object.java:485)
          at org.apache.derby.impl.services.monitor.AntiGC.run(BaseMonitor.java:2196)
        • locked <0x8c5870a8> (a org.apache.derby.impl.services.monitor.AntiGC)
          at java.lang.Thread.run(Thread.java:619)

        "Low Memory Detector" daemon prio=10 tid=0x0808e000 nid=0x200a runnable [0x00000000..0x00000000]
        java.lang.Thread.State: RUNNABLE

        "CompilerThread0" daemon prio=10 tid=0x0808c400 nid=0x2009 waiting on condition [0x00000000..0xb5a5b858]
        java.lang.Thread.State: RUNNABLE

        "Signal Dispatcher" daemon prio=10 tid=0x0808b000 nid=0x2008 runnable [0x00000000..0xb5aace20]
        java.lang.Thread.State: RUNNABLE

        "Finalizer" daemon prio=10 tid=0x08082800 nid=0x2007 in Object.wait() [0xb5b3c000..0xb5b3d0c0]
        java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)

        • waiting on <0x8c551968> (a java.lang.ref.ReferenceQueue$Lock)
          at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
        • locked <0x8c551968> (a java.lang.ref.ReferenceQueue$Lock)
          at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
          at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

        "Reference Handler" daemon prio=10 tid=0x08081800 nid=0x2006 in Object.wait() [0xb5b8d000..0xb5b8e040]
        java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)

        • waiting on <0x8c5519f8> (a java.lang.ref.Reference$Lock)
          at java.lang.Object.wait(Object.java:485)
          at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        • locked <0x8c5519f8> (a java.lang.ref.Reference$Lock)

        "VM Thread" prio=10 tid=0x08080000 nid=0x2005 runnable

        "VM Periodic Task Thread" prio=10 tid=0x0808f800 nid=0x200b waiting on condition

        JNI global references: 690

        -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

        Slave

        "Attach Listener" daemon prio=10 tid=0x0820ac00 nid=0x2092 runnable [0x00000000..0x00000000]
        java.lang.Thread.State: RUNNABLE

        "DestroyJavaVM" prio=10 tid=0x08059800 nid=0x2029 waiting on condition [0x00000000..0xb7d66090]
        java.lang.Thread.State: RUNNABLE

        "derby.slave.logger-mydb" prio=10 tid=0x0820f000 nid=0x203a runnable [0xb54ac000..0xb54acf40]
        java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.net.SocketInputStream.read(SocketInputStream.java:182)
        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2247)
        at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2540)
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2550)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
        at org.apache.derby.impl.store.replication.net.SocketConnection.readMessage(SocketConnection.java:84)
        at org.apache.derby.impl.store.replication.net.ReplicationMessageReceive.readMessage(ReplicationMessageReceive.java:387)
        at org.apache.derby.impl.store.replication.slave.SlaveController$SlaveLogReceiverThread.run(SlaveController.java:476)

        "derby.rawStoreDaemon" daemon prio=10 tid=0x083bc800 nid=0x2039 in Object.wait() [0xb5503000..0xb5503fc0]
        java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)

        • waiting on <0x8c1201b8> (a org.apache.derby.impl.services.daemon.BasicDaemon)
          at org.apache.derby.impl.services.daemon.BasicDaemon.rest(BasicDaemon.java:571)
        • locked <0x8c1201b8> (a org.apache.derby.impl.services.daemon.BasicDaemon)
          at org.apache.derby.impl.services.daemon.BasicDaemon.run(BasicDaemon.java:388)
          at java.lang.Thread.run(Thread.java:619)

        "derby.slave.boot-mydb" prio=10 tid=0x083bc400 nid=0x2038 in Object.wait() [0xb5570000..0xb5571040]
        java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)

        • waiting on <0x8c120268> (a java.lang.Object)
          at java.lang.Object.wait(Object.java:485)
          at org.apache.derby.impl.store.raw.log.LogToFile.recover(LogToFile.java:712)
        • locked <0x8c120268> (a java.lang.Object)
          at org.apache.derby.impl.store.raw.RawStore.boot(RawStore.java:334)
          at org.apache.derby.impl.services.monitor.BaseMonitor.boot(BaseMonitor.java:1999)
          at org.apache.derby.impl.services.monitor.TopService.bootModule(TopService.java:291)
          at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(BaseMonitor.java:553)
          at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Monitor.java:427)
          at org.apache.derby.impl.store.access.RAMAccessManager.boot(RAMAccessManager.java:1019)
          at org.apache.derby.impl.services.monitor.BaseMonitor.boot(BaseMonitor.java:1999)
          at org.apache.derby.impl.services.monitor.TopService.bootModule(TopService.java:291)
          at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(BaseMonitor.java:553)
          at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Monitor.java:427)
          at org.apache.derby.impl.db.BasicDatabase.bootStore(BasicDatabase.java:789)
          at org.apache.derby.impl.db.BasicDatabase.boot(BasicDatabase.java:205)
          at org.apache.derby.impl.db.SlaveDatabase.bootBasicDatabase(SlaveDatabase.java:421)
          at org.apache.derby.impl.db.SlaveDatabase.access$000(SlaveDatabase.java:70)
          at org.apache.derby.impl.db.SlaveDatabase$SlaveDatabaseBootThread.run(SlaveDatabase.java:308)
          at java.lang.Thread.run(Thread.java:619)

        "Timer-0" daemon prio=10 tid=0x0816d800 nid=0x2036 in Object.wait() [0xb55cf000..0xb55d0140]
        java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)

        • waiting on <0x8c5d4980> (a java.util.TaskQueue)
          at java.lang.Object.wait(Object.java:485)
          at java.util.TimerThread.mainLoop(Timer.java:483)
        • locked <0x8c5d4980> (a java.util.TaskQueue)
          at java.util.TimerThread.run(Timer.java:462)

        "derby.antiGC" daemon prio=10 tid=0x08062400 nid=0x2035 in Object.wait() [0xb5623000..0xb5623dc0]
        java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)

        • waiting on <0x8c587550> (a org.apache.derby.impl.services.monitor.AntiGC)
          at java.lang.Object.wait(Object.java:485)
          at org.apache.derby.impl.services.monitor.AntiGC.run(BaseMonitor.java:2196)
        • locked <0x8c587550> (a org.apache.derby.impl.services.monitor.AntiGC)
          at java.lang.Thread.run(Thread.java:619)

        "Low Memory Detector" daemon prio=10 tid=0x0808e000 nid=0x202f runnable [0x00000000..0x00000000]
        java.lang.Thread.State: RUNNABLE

        "CompilerThread0" daemon prio=10 tid=0x0808c400 nid=0x202e waiting on condition [0x00000000..0xb59c59d8]
        java.lang.Thread.State: RUNNABLE

        "Signal Dispatcher" daemon prio=10 tid=0x0808b000 nid=0x202d runnable [0x00000000..0xb5a16ca0]
        java.lang.Thread.State: RUNNABLE

        "Finalizer" daemon prio=10 tid=0x08082800 nid=0x202c in Object.wait() [0xb5aa6000..0xb5aa7040]
        java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)

        • waiting on <0x8c551960> (a java.lang.ref.ReferenceQueue$Lock)
          at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
        • locked <0x8c551960> (a java.lang.ref.ReferenceQueue$Lock)
          at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
          at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

        "Reference Handler" daemon prio=10 tid=0x08081800 nid=0x202b in Object.wait() [0xb5af7000..0xb5af80c0]
        java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)

        • waiting on <0x8c5519f0> (a java.lang.ref.Reference$Lock)
          at java.lang.Object.wait(Object.java:485)
          at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        • locked <0x8c5519f0> (a java.lang.ref.Reference$Lock)

        "VM Thread" prio=10 tid=0x08080000 nid=0x202a runnable

        "VM Periodic Task Thread" prio=10 tid=0x0808f800 nid=0x2030 waiting on condition

        JNI global references: 732
        ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

        Show
        V.Narayanan added a comment - I am not able to reproduce this problem on the network client (surprises me, Maybe I did something wrong!!). But am able to reproduce this in the embedded mode. I did a jstack to find out what is hanging and found the following relevant ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- master "Attach Listener" daemon prio=10 tid=0x08312000 nid=0x2086 waiting on condition [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "DestroyJavaVM" prio=10 tid=0x08059800 nid=0x2004 waiting on condition [0x00000000..0xb7dfc090] java.lang.Thread.State: RUNNABLE "derby.master.logger-mydb" prio=10 tid=0x08311800 nid=0x203b in Object.wait() [0xb5594000..0xb55950c0] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x8c0b6180> (a org.apache.derby.impl.store.replication.master.AsynchronousLogShipper) at org.apache.derby.impl.store.replication.master.AsynchronousLogShipper.run(AsynchronousLogShipper.java:203) locked <0x8c0b6180> (a org.apache.derby.impl.store.replication.master.AsynchronousLogShipper) "derby.rawStoreDaemon" daemon prio=10 tid=0x082b5c00 nid=0x2015 in Object.wait() [0xb5607000..0xb5608040] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x8c61f0a0> (a org.apache.derby.impl.services.daemon.BasicDaemon) at org.apache.derby.impl.services.daemon.BasicDaemon.rest(BasicDaemon.java:571) locked <0x8c61f0a0> (a org.apache.derby.impl.services.daemon.BasicDaemon) at org.apache.derby.impl.services.daemon.BasicDaemon.run(BasicDaemon.java:388) at java.lang.Thread.run(Thread.java:619) "Timer-0" daemon prio=10 tid=0x08294000 nid=0x2014 in Object.wait() [0xb5666000..0xb5666fc0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x8c5d47b0> (a java.util.TaskQueue) at java.lang.Object.wait(Object.java:485) at java.util.TimerThread.mainLoop(Timer.java:483) locked <0x8c5d47b0> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:462) "derby.antiGC" daemon prio=10 tid=0x08213c00 nid=0x2013 in Object.wait() [0xb56ba000..0xb56baf40] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x8c5870a8> (a org.apache.derby.impl.services.monitor.AntiGC) at java.lang.Object.wait(Object.java:485) at org.apache.derby.impl.services.monitor.AntiGC.run(BaseMonitor.java:2196) locked <0x8c5870a8> (a org.apache.derby.impl.services.monitor.AntiGC) at java.lang.Thread.run(Thread.java:619) "Low Memory Detector" daemon prio=10 tid=0x0808e000 nid=0x200a runnable [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x0808c400 nid=0x2009 waiting on condition [0x00000000..0xb5a5b858] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x0808b000 nid=0x2008 runnable [0x00000000..0xb5aace20] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x08082800 nid=0x2007 in Object.wait() [0xb5b3c000..0xb5b3d0c0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x8c551968> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) locked <0x8c551968> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x08081800 nid=0x2006 in Object.wait() [0xb5b8d000..0xb5b8e040] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x8c5519f8> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) locked <0x8c5519f8> (a java.lang.ref.Reference$Lock) "VM Thread" prio=10 tid=0x08080000 nid=0x2005 runnable "VM Periodic Task Thread" prio=10 tid=0x0808f800 nid=0x200b waiting on condition JNI global references: 690 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Slave "Attach Listener" daemon prio=10 tid=0x0820ac00 nid=0x2092 runnable [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "DestroyJavaVM" prio=10 tid=0x08059800 nid=0x2029 waiting on condition [0x00000000..0xb7d66090] java.lang.Thread.State: RUNNABLE "derby.slave.logger-mydb" prio=10 tid=0x0820f000 nid=0x203a runnable [0xb54ac000..0xb54acf40] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.net.SocketInputStream.read(SocketInputStream.java:182) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2247) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2540) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2550) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at org.apache.derby.impl.store.replication.net.SocketConnection.readMessage(SocketConnection.java:84) at org.apache.derby.impl.store.replication.net.ReplicationMessageReceive.readMessage(ReplicationMessageReceive.java:387) at org.apache.derby.impl.store.replication.slave.SlaveController$SlaveLogReceiverThread.run(SlaveController.java:476) "derby.rawStoreDaemon" daemon prio=10 tid=0x083bc800 nid=0x2039 in Object.wait() [0xb5503000..0xb5503fc0] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x8c1201b8> (a org.apache.derby.impl.services.daemon.BasicDaemon) at org.apache.derby.impl.services.daemon.BasicDaemon.rest(BasicDaemon.java:571) locked <0x8c1201b8> (a org.apache.derby.impl.services.daemon.BasicDaemon) at org.apache.derby.impl.services.daemon.BasicDaemon.run(BasicDaemon.java:388) at java.lang.Thread.run(Thread.java:619) "derby.slave.boot-mydb" prio=10 tid=0x083bc400 nid=0x2038 in Object.wait() [0xb5570000..0xb5571040] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x8c120268> (a java.lang.Object) at java.lang.Object.wait(Object.java:485) at org.apache.derby.impl.store.raw.log.LogToFile.recover(LogToFile.java:712) locked <0x8c120268> (a java.lang.Object) at org.apache.derby.impl.store.raw.RawStore.boot(RawStore.java:334) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(BaseMonitor.java:1999) at org.apache.derby.impl.services.monitor.TopService.bootModule(TopService.java:291) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(BaseMonitor.java:553) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Monitor.java:427) at org.apache.derby.impl.store.access.RAMAccessManager.boot(RAMAccessManager.java:1019) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(BaseMonitor.java:1999) at org.apache.derby.impl.services.monitor.TopService.bootModule(TopService.java:291) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(BaseMonitor.java:553) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Monitor.java:427) at org.apache.derby.impl.db.BasicDatabase.bootStore(BasicDatabase.java:789) at org.apache.derby.impl.db.BasicDatabase.boot(BasicDatabase.java:205) at org.apache.derby.impl.db.SlaveDatabase.bootBasicDatabase(SlaveDatabase.java:421) at org.apache.derby.impl.db.SlaveDatabase.access$000(SlaveDatabase.java:70) at org.apache.derby.impl.db.SlaveDatabase$SlaveDatabaseBootThread.run(SlaveDatabase.java:308) at java.lang.Thread.run(Thread.java:619) "Timer-0" daemon prio=10 tid=0x0816d800 nid=0x2036 in Object.wait() [0xb55cf000..0xb55d0140] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x8c5d4980> (a java.util.TaskQueue) at java.lang.Object.wait(Object.java:485) at java.util.TimerThread.mainLoop(Timer.java:483) locked <0x8c5d4980> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:462) "derby.antiGC" daemon prio=10 tid=0x08062400 nid=0x2035 in Object.wait() [0xb5623000..0xb5623dc0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x8c587550> (a org.apache.derby.impl.services.monitor.AntiGC) at java.lang.Object.wait(Object.java:485) at org.apache.derby.impl.services.monitor.AntiGC.run(BaseMonitor.java:2196) locked <0x8c587550> (a org.apache.derby.impl.services.monitor.AntiGC) at java.lang.Thread.run(Thread.java:619) "Low Memory Detector" daemon prio=10 tid=0x0808e000 nid=0x202f runnable [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x0808c400 nid=0x202e waiting on condition [0x00000000..0xb59c59d8] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x0808b000 nid=0x202d runnable [0x00000000..0xb5a16ca0] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x08082800 nid=0x202c in Object.wait() [0xb5aa6000..0xb5aa7040] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x8c551960> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) locked <0x8c551960> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x08081800 nid=0x202b in Object.wait() [0xb5af7000..0xb5af80c0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x8c5519f0> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) locked <0x8c5519f0> (a java.lang.ref.Reference$Lock) "VM Thread" prio=10 tid=0x08080000 nid=0x202a runnable "VM Periodic Task Thread" prio=10 tid=0x0808f800 nid=0x2030 waiting on condition JNI global references: 732 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        Hide
        V.Narayanan added a comment -

        This shows that

        "derby.master.logger-mydb" prio=10 tid=0x08311800 nid=0x203b in Object.wait() [0xb5594000..0xb55950c0]
        java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)

        • waiting on <0x8c0b6180> (a org.apache.derby.impl.store.replication.master.AsynchronousLogShipper)
          at org.apache.derby.impl.store.replication.master.AsynchronousLogShipper.run(AsynchronousLogShipper.java:203)
        • locked <0x8c0b6180> (a org.apache.derby.impl.store.replication.master.AsynchronousLogShipper)

        the log shipper thread is still running causing the hang in the master

        This shows that

        derby.slave.logger-mydb" prio=10 tid=0x0820f000 nid=0x203a runnable [0xb54ac000..0xb54acf40]
        java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.net.SocketInputStream.read(SocketInputStream.java:182)
        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2247)
        at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2540)
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2550)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
        at org.apache.derby.impl.store.replication.net.SocketConnection.readMessage(SocketConnection.java:84)
        at org.apache.derby.impl.store.replication.net.ReplicationMessageReceive.readMessage(ReplicationMessageReceive.java:387)
        at org.apache.derby.impl.store.replication.slave.SlaveController$SlaveLogReceiverThread.run(SlaveController.java:476)

        The SlaveLogReceiver thread is hanging on the read method of the socket it is listening on causing the hang on the slave.

        The questions I am searching for answers now is

        1) Why is this happening only in the embedded mode?

        2) How does turning authentication on affect this?

        Show
        V.Narayanan added a comment - This shows that "derby.master.logger-mydb" prio=10 tid=0x08311800 nid=0x203b in Object.wait() [0xb5594000..0xb55950c0] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0x8c0b6180> (a org.apache.derby.impl.store.replication.master.AsynchronousLogShipper) at org.apache.derby.impl.store.replication.master.AsynchronousLogShipper.run(AsynchronousLogShipper.java:203) locked <0x8c0b6180> (a org.apache.derby.impl.store.replication.master.AsynchronousLogShipper) the log shipper thread is still running causing the hang in the master This shows that derby.slave.logger-mydb" prio=10 tid=0x0820f000 nid=0x203a runnable [0xb54ac000..0xb54acf40] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.net.SocketInputStream.read(SocketInputStream.java:182) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2247) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2540) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2550) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at org.apache.derby.impl.store.replication.net.SocketConnection.readMessage(SocketConnection.java:84) at org.apache.derby.impl.store.replication.net.ReplicationMessageReceive.readMessage(ReplicationMessageReceive.java:387) at org.apache.derby.impl.store.replication.slave.SlaveController$SlaveLogReceiverThread.run(SlaveController.java:476) The SlaveLogReceiver thread is hanging on the read method of the socket it is listening on causing the hang on the slave. The questions I am searching for answers now is 1) Why is this happening only in the embedded mode? 2) How does turning authentication on affect this?
        Hide
        V.Narayanan added a comment -

        Fixed the earlier broken patch for the new trunk

        Show
        V.Narayanan added a comment - Fixed the earlier broken patch for the new trunk
        Hide
        Øystein Grøvlen added a comment -

        V.Narayanan (JIRA) wrote:
        > I am not able to reproduce this problem on the network client (surprises me, Maybe I did something wrong!!).

        This may be ovious, but to be sure: Note that when embedded a quit in ij will stop the VM that contains the master. If client/server, the master is running in a separate VM. I not quite sure you will necessarily get the same code path by stopping a server.

        Show
        Øystein Grøvlen added a comment - V.Narayanan (JIRA) wrote: > I am not able to reproduce this problem on the network client (surprises me, Maybe I did something wrong!!). This may be ovious, but to be sure: Note that when embedded a quit in ij will stop the VM that contains the master. If client/server, the master is running in a separate VM. I not quite sure you will necessarily get the same code path by stopping a server.
        Hide
        V.Narayanan added a comment -

        I understand that the code path will be different in the case of the
        client. But I am confused about its import.

        Are you trying to say that the problem of the log shipper thread
        clean-up not happening might still be there
        in the client but our current way of trying to reproduce the problem is
        not right? (or)

        Are you trying to say that this problem need not be looked from the
        client perspective at all (I think you don't mean
        this but just to be sure)

        I am trying to work with the embedded reproducible for now.

        But from your comment it looks like finding a solution in the embedded
        case might not necessarily mean that this
        problem has been solved in all cases?

        Show
        V.Narayanan added a comment - I understand that the code path will be different in the case of the client. But I am confused about its import. Are you trying to say that the problem of the log shipper thread clean-up not happening might still be there in the client but our current way of trying to reproduce the problem is not right? (or) Are you trying to say that this problem need not be looked from the client perspective at all (I think you don't mean this but just to be sure) I am trying to work with the embedded reproducible for now. But from your comment it looks like finding a solution in the embedded case might not necessarily mean that this problem has been solved in all cases?
        Hide
        Øystein Grøvlen added a comment -

        V.Narayanan (JIRA) wrote:
        > Are you trying to say that the problem of the log shipper thread
        > clean-up not happening might still be there
        > in the client but our current way of trying to reproduce the problem is
        > not right? (or)

        No, my point is that there will never be a log shipper problem in the client, because the log shipper is running in the server.

        Show
        Øystein Grøvlen added a comment - V.Narayanan (JIRA) wrote: > Are you trying to say that the problem of the log shipper thread > clean-up not happening might still be there > in the client but our current way of trying to reproduce the problem is > not right? (or) No, my point is that there will never be a log shipper problem in the client, because the log shipper is running in the server.
        Hide
        Øystein Grøvlen added a comment -

        With the Derby3447_v2.diff patch, the replication test suite fails as follows:

        > java junit.textui.TestRunner -noloading org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationSuite ...F...
        Time: 536.931
        There was 1 failure:
        1) testReplication_Local_StateTest_part1_1(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_1)junit.framework.AssertionFailedError: jdbc:derby://localhost:4527//export/tmp/oysteing/derby-repl/testing_repl/db_slave/wombat;stopSlave=true failed: -1 XRE11 DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE11, SQLERRMC: stopSlave/export/tmp/oysteing/derby-repl/testing_repl/db_slave/wombatXRE11
        at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_1._testPostStartedMasterAndSlave_StopSlave(ReplicationRun_Local_StateTest_part1_1.java:222)
        at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_1.testReplication_Local_StateTest_part1_1(ReplicationRun_Local_StateTest_part1_1.java:130)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:101)
        at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
        at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
        at junit.extensions.TestSetup.run(TestSetup.java:25)

        FAILURES!!!
        Tests run: 6, Failures: 1, Errors: 0

        Show
        Øystein Grøvlen added a comment - With the Derby3447_v2.diff patch, the replication test suite fails as follows: > java junit.textui.TestRunner -noloading org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationSuite ...F... Time: 536.931 There was 1 failure: 1) testReplication_Local_StateTest_part1_1(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_1)junit.framework.AssertionFailedError: jdbc:derby://localhost:4527//export/tmp/oysteing/derby-repl/testing_repl/db_slave/wombat;stopSlave=true failed: -1 XRE11 DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE11, SQLERRMC: stopSlave/export/tmp/oysteing/derby-repl/testing_repl/db_slave/wombatXRE11 at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_1._testPostStartedMasterAndSlave_StopSlave(ReplicationRun_Local_StateTest_part1_1.java:222) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_1.testReplication_Local_StateTest_part1_1(ReplicationRun_Local_StateTest_part1_1.java:130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:101) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) FAILURES!!! Tests run: 6, Failures: 1, Errors: 0
        Hide
        V.Narayanan added a comment -

        Thank you for the inputs Oystein.

        I have been single stepping this issue and will briefly try to describe my progress so far.

        I tried calling the stopMaster() method from RawStore#stop() to check if this will help me
        in determining if this would help result in a proper master shutdown. It did not.

        I resorted to good old single stepping with my buddy Netbeans

        I placed breakpoints in the following places

        1) ij#quit() line no 342
        2) TopService#shutdown() line no 327 (I did this after I did one round of single stepping with 3) below)
        3) MasterController#stopMaster() line no 262

        I basically started ij with the following command

        java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=9009 org.apache.derby.tools.ij

        and connected to it through netbeans

        I basically observed the following when authentication is enabled

        1) MasterController#stopMaster() is never called (called when authentication is not enabled)
        2) TopService#shutdown() is never called with authentication enabled ( I tried this twice and observed the
        same this. Shouldn't the modules that are currently loaded be shutdown when authentication is enabled too?
        I am not sure how replication code can influence this.)

        I need to basically take it from here. I decided to report because I thought maybe people
        could guide me as to why TopService#shutdown() is not called with authentication enabled and as to whether
        this is something they know of?

        TopService#shutdown() is however called when I do a connect 'jdbc:derby:;shutdown=true;user=oystein;password=pass';

        Show
        V.Narayanan added a comment - Thank you for the inputs Oystein. I have been single stepping this issue and will briefly try to describe my progress so far. I tried calling the stopMaster() method from RawStore#stop() to check if this will help me in determining if this would help result in a proper master shutdown. It did not. I resorted to good old single stepping with my buddy Netbeans I placed breakpoints in the following places 1) ij#quit() line no 342 2) TopService#shutdown() line no 327 (I did this after I did one round of single stepping with 3) below) 3) MasterController#stopMaster() line no 262 I basically started ij with the following command java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=9009 org.apache.derby.tools.ij and connected to it through netbeans I basically observed the following when authentication is enabled 1) MasterController#stopMaster() is never called (called when authentication is not enabled) 2) TopService#shutdown() is never called with authentication enabled ( I tried this twice and observed the same this. Shouldn't the modules that are currently loaded be shutdown when authentication is enabled too? I am not sure how replication code can influence this.) I need to basically take it from here. I decided to report because I thought maybe people could guide me as to why TopService#shutdown() is not called with authentication enabled and as to whether this is something they know of? TopService#shutdown() is however called when I do a connect 'jdbc:derby:;shutdown=true;user=oystein;password=pass';
        Hide
        V.Narayanan added a comment -

        I did a little bit more digging on a more generalized behaviour when authentication
        is enabled in the embedded mode and raised https://issues.apache.org/jira/browse/DERBY-3541
        to address this issue.

        This issue depends on DERBY-3451 to be resolved.

        Show
        V.Narayanan added a comment - I did a little bit more digging on a more generalized behaviour when authentication is enabled in the embedded mode and raised https://issues.apache.org/jira/browse/DERBY-3541 to address this issue. This issue depends on DERBY-3451 to be resolved.
        Hide
        V.Narayanan added a comment -

        When I found out about Derby-3451 I was thinking why the quit without
        starting replication does not seem to hang. I found out that it does not
        hang because the other threads used in derby are Daemon threads .

        see BaseMonitor#getDaemonThread (line no 2080)

        relevant lines

        Thread t = new Thread(daemonGroup, task, "derby.".concat(name));
        t.setDaemon(true);

        found "The Java Virtual Machine exits when the only threads running are all daemon threads."
        here http://java.sun.com/j2se/1.3/docs/api/java/lang/Thread.html#setDaemon(boolean)

        Show
        V.Narayanan added a comment - When I found out about Derby-3451 I was thinking why the quit without starting replication does not seem to hang. I found out that it does not hang because the other threads used in derby are Daemon threads . see BaseMonitor#getDaemonThread (line no 2080) relevant lines Thread t = new Thread(daemonGroup, task, "derby.".concat(name)); t.setDaemon(true); found "The Java Virtual Machine exits when the only threads running are all daemon threads." here http://java.sun.com/j2se/1.3/docs/api/java/lang/Thread.html#setDaemon(boolean )
        Hide
        V.Narayanan added a comment -

        I believe it is not right for the replication threads to prevent the jvm running
        an embedded application (like ij) from shutting down. These threads should also
        be brought down along with the jvm. Hence I believe that the replication
        threads should be daemon threads so that when the vm exits these threads
        are also brought down along with the vm.

        Converting the following replication related threads to
        Daemon threads resolved the problem of these threads blocking
        the ij process from executing a quit; on both the master and
        the slave.

        1) Slave database boot thread (slave)
        2) Log receiver thread (slave)
        3) Log shipper thread (master)

        I will run the replication suite of tests and shall revert back with
        results for this patch.

        I request for this patch to be please considered for reviews/comments.

        Show
        V.Narayanan added a comment - I believe it is not right for the replication threads to prevent the jvm running an embedded application (like ij) from shutting down. These threads should also be brought down along with the jvm. Hence I believe that the replication threads should be daemon threads so that when the vm exits these threads are also brought down along with the vm. Converting the following replication related threads to Daemon threads resolved the problem of these threads blocking the ij process from executing a quit; on both the master and the slave. 1) Slave database boot thread (slave) 2) Log receiver thread (slave) 3) Log shipper thread (master) I will run the replication suite of tests and shall revert back with results for this patch. I request for this patch to be please considered for reviews/comments.
        Hide
        V.Narayanan added a comment - - edited

        I was able to quit ij with authentication enabled on both the master
        and slave with patch v3.

        On the Master
        -----------------

        vn@vn-laptop:~/work/workspaces/Derby3447_1/master$ java org.apache.derby.tools.ij
        ij version 10.5
        ij> connect 'jdbc:derby:mydb;create=true;user=oystein;password=pass';
        ij> call syscs_util.syscs_freeze_database();
        0 rows inserted/updated/deleted
        ij> connect 'jdbc:derby:mydb;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort=8001';
        ij(CONNECTION1)> create table narayanan(i int, name varchar(30));
        0 rows inserted/updated/deleted
        ij(CONNECTION1)> insert into narayanan values(2,'n1');
        1 row inserted/updated/deleted
        ij(CONNECTION1)> insert into narayanan values(3, 'n2');
        1 row inserted/updated/deleted
        ij(CONNECTION1)> quit;
        vn@vn-laptop:~/work/workspaces/Derby3447_1/master$

        On the slave
        -------------

        ij version 10.5
        ij> connect 'jdbc:derby:mydb;user=oystein;password=pass;startSlave=true;slaveHost=localhost;slavePort=8001';
        ERROR XRE08: Replication slave mode started successfully for database 'mydb'. Connection refused because the database is in replication slave mode.
        ij> quit;
        vn@vn-laptop:~/work/workspaces/Derby3447_1/slave$

        I tried it in both orders also (i.e.) first trying a quit on the master and then a quit on the slave
        and vice versa.

        I also tried quit without executing any transactions on the master and in both orders.

        It worked fine for me in all the cases with patch v3.

        Show
        V.Narayanan added a comment - - edited I was able to quit ij with authentication enabled on both the master and slave with patch v3. On the Master ----------------- vn@vn-laptop:~/work/workspaces/Derby3447_1/master$ java org.apache.derby.tools.ij ij version 10.5 ij> connect 'jdbc:derby:mydb;create=true;user=oystein;password=pass'; ij> call syscs_util.syscs_freeze_database(); 0 rows inserted/updated/deleted ij> connect 'jdbc:derby:mydb;user=oystein;password=pass;startMaster=true;slaveHost=localhost;slavePort=8001'; ij(CONNECTION1)> create table narayanan(i int, name varchar(30)); 0 rows inserted/updated/deleted ij(CONNECTION1)> insert into narayanan values(2,'n1'); 1 row inserted/updated/deleted ij(CONNECTION1)> insert into narayanan values(3, 'n2'); 1 row inserted/updated/deleted ij(CONNECTION1)> quit; vn@vn-laptop:~/work/workspaces/Derby3447_1/master$ On the slave ------------- ij version 10.5 ij> connect 'jdbc:derby:mydb;user=oystein;password=pass;startSlave=true;slaveHost=localhost;slavePort=8001'; ERROR XRE08: Replication slave mode started successfully for database 'mydb'. Connection refused because the database is in replication slave mode. ij> quit; vn@vn-laptop:~/work/workspaces/Derby3447_1/slave$ I tried it in both orders also (i.e.) first trying a quit on the master and then a quit on the slave and vice versa. I also tried quit without executing any transactions on the master and in both orders. It worked fine for me in all the cases with patch v3.
        Hide
        V.Narayanan added a comment -

        I ran the replication suite of tests and observed this failure. I am investigating this failure.

        java -Xmx512m -XX:MaxPermSize=128M junit.textui.TestRunner org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationSuite
        ...F...
        Time: 542.146
        There was 1 failure:
        1) testReplication_Local_StateTest_part1_1(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_1)junit.framework.AssertionFailedError: jdbc:derby://localhost:4527//home/vn/work/workspaces/Derby3447_1/tests/db_slave/wombat;stopSlave=true failed: -1 XRE11 DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE11, SQLERRMC: stopSlave/home/vn/work/workspaces/Derby3447_1/tests/db_slave/wombatXRE11
        at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_1._testPostStartedMasterAndSlave_StopSlave(ReplicationRun_Local_StateTest_part1_1.java:222)
        at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_1.testReplication_Local_StateTest_part1_1(ReplicationRun_Local_StateTest_part1_1.java:130)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:101)
        at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
        at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
        at junit.extensions.TestSetup.run(TestSetup.java:23)

        FAILURES!!!
        Tests run: 6, Failures: 1, Errors: 0

        Show
        V.Narayanan added a comment - I ran the replication suite of tests and observed this failure. I am investigating this failure. java -Xmx512m -XX:MaxPermSize=128M junit.textui.TestRunner org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationSuite ...F... Time: 542.146 There was 1 failure: 1) testReplication_Local_StateTest_part1_1(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_1)junit.framework.AssertionFailedError: jdbc:derby://localhost:4527//home/vn/work/workspaces/Derby3447_1/tests/db_slave/wombat;stopSlave=true failed: -1 XRE11 DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE11, SQLERRMC: stopSlave/home/vn/work/workspaces/Derby3447_1/tests/db_slave/wombatXRE11 at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_1._testPostStartedMasterAndSlave_StopSlave(ReplicationRun_Local_StateTest_part1_1.java:222) at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part1_1.testReplication_Local_StateTest_part1_1(ReplicationRun_Local_StateTest_part1_1.java:130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:101) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:23) FAILURES!!! Tests run: 6, Failures: 1, Errors: 0
        Hide
        Øystein Grøvlen added a comment -

        This error looks like what if run with classes instead of jar files. Is this the case for you?

        Show
        Øystein Grøvlen added a comment - This error looks like what if run with classes instead of jar files. Is this the case for you?
        Hide
        V.Narayanan added a comment -

        I ran my tests with the jars Oystein, I observed this error when run with jars

        Show
        V.Narayanan added a comment - I ran my tests with the jars Oystein, I observed this error when run with jars
        Hide
        V.Narayanan added a comment -

        Sorry the comment I deleted was a click too quick.

        The part of the replication test where the failure occurs tries to test a slave
        shutdown after the master has been taken down.

        I tried doing the above manually and observe the following

        ij> connect 'jdbc:derby://localhost:1528/replicationdb;user=oystein;password=pass;stopSlave=true';
        ERROR XRE41: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE41, SQLERRMC: XRE41
        ij> connect 'jdbc:derby://localhost:1528/replicationdb;user=oystein;password=pass;stopSlave=true';
        ERROR XRE42: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE42, SQLERRMC: replicationdbXRE42
        ij>

        So before the master is killed the slave take down attempt fails XRE41

        and after the master is killed it succeeds XRE42.

        Seems like the code is behaving as the tests ask it too. I will investigate further into this.

        Show
        V.Narayanan added a comment - Sorry the comment I deleted was a click too quick. The part of the replication test where the failure occurs tries to test a slave shutdown after the master has been taken down. I tried doing the above manually and observe the following ij> connect 'jdbc:derby://localhost:1528/replicationdb;user=oystein;password=pass;stopSlave=true'; ERROR XRE41: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE41, SQLERRMC: XRE41 ij> connect 'jdbc:derby://localhost:1528/replicationdb;user=oystein;password=pass;stopSlave=true'; ERROR XRE42: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE42, SQLERRMC: replicationdbXRE42 ij> So before the master is killed the slave take down attempt fails XRE41 and after the master is killed it succeeds XRE42. Seems like the code is behaving as the tests ask it too. I will investigate further into this.
        Hide
        V.Narayanan added a comment -

        I think I have a strong theory to explain this test failure,

        When the Network Server running the master is shutdown as is being done in the test it result in the stop method
        of the MasterController module being called. This causes the master to send a STOP message
        to the slave. This causes a slave shutdown.

        The cleanup code was added in the MasterController stop method as part of this patch.

        I think this is expected behaviour and XRE11 is the right message in this case. The NetworkServer
        running the master being shutdown results in the Replication master shutting down, this is similar
        to the stopMaster being called.

        I think the tests need to be fixed to accomodate this new and correct behaviour

        I will fix this test and submit the patch again

        Show
        V.Narayanan added a comment - I think I have a strong theory to explain this test failure, When the Network Server running the master is shutdown as is being done in the test it result in the stop method of the MasterController module being called. This causes the master to send a STOP message to the slave. This causes a slave shutdown. The cleanup code was added in the MasterController stop method as part of this patch. I think this is expected behaviour and XRE11 is the right message in this case. The NetworkServer running the master being shutdown results in the Replication master shutting down, this is similar to the stopMaster being called. I think the tests need to be fixed to accomodate this new and correct behaviour I will fix this test and submit the patch again
        Hide
        V.Narayanan added a comment -

        In my earlier comment I had written that I was getting an XRE42.
        I was getting a XRE42 because I had misunderstood killing the
        slave to mean doing a kill on the network server running the master.
        This does not result in a proper shutdown of the master. Hence
        resulting in a XRE42 (allowing a proper shutdown of the slave).

        Show
        V.Narayanan added a comment - In my earlier comment I had written that I was getting an XRE42. I was getting a XRE42 because I had misunderstood killing the slave to mean doing a kill on the network server running the master. This does not result in a proper shutdown of the master. Hence resulting in a XRE42 (allowing a proper shutdown of the slave).
        Hide
        V.Narayanan added a comment -

        Fixed the test that was failing when run with v3, as described in
        the comments above.

        Ran the replication tests and observed no failure

        vn@vn-laptop:~/work/workspaces/Derby3447_1/tests$ java -Xmx512m -XX:MaxPermSize=128M junit.textui.TestRunner org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationSuite
        ......
        Time: 545.178

        OK (6 tests)

        Show
        V.Narayanan added a comment - Fixed the test that was failing when run with v3, as described in the comments above. Ran the replication tests and observed no failure vn@vn-laptop:~/work/workspaces/Derby3447_1/tests$ java -Xmx512m -XX:MaxPermSize=128M junit.textui.TestRunner org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationSuite ...... Time: 545.178 OK (6 tests)
        Hide
        V.Narayanan added a comment -

        v4 applies cleanly to a freshworkspace. I had also run tests earlier before I submitted v4.
        I please request v4 to be considered for a commit if everything is OK.

        Show
        V.Narayanan added a comment - v4 applies cleanly to a freshworkspace. I had also run tests earlier before I submitted v4. I please request v4 to be considered for a commit if everything is OK.
        Hide
        Øystein Grøvlen added a comment -

        Thanks for the patch, Narayanan. Changes look good, and my brief
        testing shows that I am now able to quit ij when replication is
        running. However, it would be good if you could add some regression
        tests for this issue to the replication test suite.

        Before committing this patch, I want to get one question clarified:
        In MasterController#stop, if stopMaster fails, you will still write
        'Replication master role stopped for database' to derby.log. Can you
        be certain that this is the case regardless of what the error is?

        Some nitpicks you may fix if you have to go another round with this
        patch:

        • SlaveDatabase: I think sdbThread instead of sdb_Thread is more in
          line with the naming style used in the derby code.
        • MasterController: The casting to Thread does not seem to be
          necessary.
        Show
        Øystein Grøvlen added a comment - Thanks for the patch, Narayanan. Changes look good, and my brief testing shows that I am now able to quit ij when replication is running. However, it would be good if you could add some regression tests for this issue to the replication test suite. Before committing this patch, I want to get one question clarified: In MasterController#stop, if stopMaster fails, you will still write 'Replication master role stopped for database' to derby.log. Can you be certain that this is the case regardless of what the error is? Some nitpicks you may fix if you have to go another round with this patch: SlaveDatabase: I think sdbThread instead of sdb_Thread is more in line with the naming style used in the derby code. MasterController: The casting to Thread does not seem to be necessary.
        Hide
        V.Narayanan added a comment -

        Thank you for taking a look at the patch Oystein

        I intended the message to be printed that way (similar to the printStackTrace and stop method)

        I have two justifications for this

        1) The only time the message is thrown is when the master has already stopped
        2) It is like saying the master has stopped but the following exception
        occurred (which is this case is only when the master is stopped)

        I have fixed the nits pointed out.

        I was hoping to submit regression tests in a follow up.

        Show
        V.Narayanan added a comment - Thank you for taking a look at the patch Oystein I intended the message to be printed that way (similar to the printStackTrace and stop method) I have two justifications for this 1) The only time the message is thrown is when the master has already stopped 2) It is like saying the master has stopped but the following exception occurred (which is this case is only when the master is stopped) I have fixed the nits pointed out. I was hoping to submit regression tests in a follow up.
        Hide
        Øystein Grøvlen added a comment -

        Thanks, for the latest updates, Narayanan.
        I have tested the patch, and my problem is solved.
        v5 patch committed at revision 642191.

        Show
        Øystein Grøvlen added a comment - Thanks, for the latest updates, Narayanan. I have tested the patch, and my problem is solved. v5 patch committed at revision 642191.
        Hide
        Øystein Grøvlen added a comment -

        Fix ported to 10.4 branch at revision 643399.
        SVN command used:
        svn merge -r642190:642191 https://svn.apache.org/repos/asf/db/derby/code/trunk/

        Show
        Øystein Grøvlen added a comment - Fix ported to 10.4 branch at revision 643399. SVN command used: svn merge -r642190:642191 https://svn.apache.org/repos/asf/db/derby/code/trunk/

          People

          • Assignee:
            V.Narayanan
            Reporter:
            V.Narayanan
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development