Geronimo
  1. Geronimo
  2. GERONIMO-4184

In-doubt transaction Id's could be reused during server startup

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0, 2.0.1, 2.0.2, 2.1, 2.1.1, 2.1.4
    • Fix Version/s: Wish List
    • Component/s: transaction manager
    • Security Level: public (Regular issues)
    • Labels:
      None

      Description

      During server restart, we may reuse an Xid for a transaction which is in-doubt. Potentially confusing a resource manager. We need to insure this does not occur. Simple way is to remember the largest Xid in tran log and start with a larger number.

      1. GERONIM-4184.patch
        0.7 kB
        Vamsavardhana Reddy

        Activity

        Hide
        Joe Bohn added a comment -

        move fix version to 2.1.3 from 2.1.2

        Show
        Joe Bohn added a comment - move fix version to 2.1.3 from 2.1.2
        Hide
        Donald Woods added a comment -

        moving from 2.1.3 to 2.1.4

        Show
        Donald Woods added a comment - moving from 2.1.3 to 2.1.4
        Hide
        Jarek Gawor added a comment -

        Updating versions as it probably will not get fixed for 2.1.4.

        Show
        Jarek Gawor added a comment - Updating versions as it probably will not get fixed for 2.1.4.
        Hide
        David Jencks added a comment -

        Would be great to fix...

        Show
        David Jencks added a comment - Would be great to fix...
        Hide
        Vamsavardhana Reddy added a comment -

        GERONIMO-4393 has introduced two random bytes into the baseId to avoid duplication of baseId across multiple JVMs on the same machine. That fix would still not suffice completely as the two random bytes are added to baseId only if the no-argument constructor of XidFactoryImpl is used and Geronimo uses the other constructor that takes byte[] tmId as argument. If we add two random bytes to baseId even in the case of the constructor with tmId, we can minimize the chance of reuse of In-doubt transaction Id's as the baseId would be different on each run of the server. We can even increase the length of random bytes (to say 4 instead of 2) to further reduce the chance of reuse of In-doubt transaction Id's. Any suggestions or comments to this approach?

        Show
        Vamsavardhana Reddy added a comment - GERONIMO-4393 has introduced two random bytes into the baseId to avoid duplication of baseId across multiple JVMs on the same machine. That fix would still not suffice completely as the two random bytes are added to baseId only if the no-argument constructor of XidFactoryImpl is used and Geronimo uses the other constructor that takes byte[] tmId as argument. If we add two random bytes to baseId even in the case of the constructor with tmId, we can minimize the chance of reuse of In-doubt transaction Id's as the baseId would be different on each run of the server. We can even increase the length of random bytes (to say 4 instead of 2) to further reduce the chance of reuse of In-doubt transaction Id's. Any suggestions or comments to this approach?
        Hide
        Vamsavardhana Reddy added a comment -

        GERONIM-4184.patch: Add two random bytes entropy even when XidFactoryImpl(byte[]) constructor is used.
        Patch created against branches\geronimo-txmanager-parent-2.2.

        Show
        Vamsavardhana Reddy added a comment - GERONIM-4184.patch: Add two random bytes entropy even when XidFactoryImpl(byte[]) constructor is used. Patch created against branches\geronimo-txmanager-parent-2.2.
        Hide
        David Jencks added a comment -

        I think this patch will break recovery. The first 8 bytes of the byte arrays in the xid are a long counter, the rest is the tmid. Recovery needs to be able to determine if an xid coming back from an XAResource came from this tm. It does this by matching the bytes after 8 in the global xid with the baseid in the XidFactory. If the base id is different every time you start the tm, there's no way to determine if a branch that has been prepared on an XAResource but not recorded in the tm log came from this tm.

        I think the problem here is that our documentation is not sufficiently insistent that you NEED to configure multiple geronimo servers using the same RM with DIFFERENT tmids.

        As long as the above is followed, we could randomize the most significant 2 bytes of the long counter. However I think it would be better to track the counter persistently such as by reserving blocks of say 1024 ids and recording this persistently either in a log record or in a separate file.

        Show
        David Jencks added a comment - I think this patch will break recovery. The first 8 bytes of the byte arrays in the xid are a long counter, the rest is the tmid. Recovery needs to be able to determine if an xid coming back from an XAResource came from this tm. It does this by matching the bytes after 8 in the global xid with the baseid in the XidFactory. If the base id is different every time you start the tm, there's no way to determine if a branch that has been prepared on an XAResource but not recorded in the tm log came from this tm. I think the problem here is that our documentation is not sufficiently insistent that you NEED to configure multiple geronimo servers using the same RM with DIFFERENT tmids. As long as the above is followed, we could randomize the most significant 2 bytes of the long counter. However I think it would be better to track the counter persistently such as by reserving blocks of say 1024 ids and recording this persistently either in a log record or in a separate file.
        Hide
        Vamsavardhana Reddy added a comment -

        Thanks David. The one doubt I had is if introducing two random bytes along with tmId would break anything. Your comment has clarified that.

        Essentially the random bytes introduced by fix for GERONIMO-4393 will result in a different tmId each time the server is started and break transaction recovery. (Geronimo server may be immune to this as we use the XidFactoryImpl constructor with tmId as parameter.) How can we make sure that even the generated tmId would be the same for a server instance with subsequent restarts?

        Show
        Vamsavardhana Reddy added a comment - Thanks David. The one doubt I had is if introducing two random bytes along with tmId would break anything. Your comment has clarified that. Essentially the random bytes introduced by fix for GERONIMO-4393 will result in a different tmId each time the server is started and break transaction recovery. (Geronimo server may be immune to this as we use the XidFactoryImpl constructor with tmId as parameter.) How can we make sure that even the generated tmId would be the same for a server instance with subsequent restarts?
        Hide
        David Jencks added a comment -

        The no-arg constructor is only there for people who don't care about recovery and can't be bothered to set up the tm properly.

        I guess if we're going to do anything here we should figure out a way to store the baseid and counter range in the tx log (in the log object, not necessarily in the log files) and read them in as part of recovery. Then if there's a recorded baseid we can use it.

        Show
        David Jencks added a comment - The no-arg constructor is only there for people who don't care about recovery and can't be bothered to set up the tm properly. I guess if we're going to do anything here we should figure out a way to store the baseid and counter range in the tx log (in the log object, not necessarily in the log files) and read them in as part of recovery. Then if there's a recorded baseid we can use it.

          People

          • Assignee:
            Unassigned
            Reporter:
            Kevan Miller
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development