Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Fix Version/s: 1.2.0 beta 1
    • Component/s: None
    • Labels:
      None

      Description

      Based on the discussion in CASSANDRA-2819, we can make the bookkeeping for request times more accurate.

      1. 2858-v3.txt
        9 kB
        Jonathan Ellis
      2. 2858-v2.txt
        9 kB
        Jonathan Ellis
      3. 2858.txt
        7 kB
        Jonathan Ellis

        Activity

        Hide
        jbellis Jonathan Ellis added a comment -

        Thanks Vijay!

        Show
        jbellis Jonathan Ellis added a comment - Thanks Vijay!
        Show
        vijay2win@yahoo.com Vijay added a comment - Looks like we still need a minor edit to 822ee88a38b3862d60b50748382ddf7957907cec, after this change CliTest passes. https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blobdiff;f=src/java/org/apache/cassandra/net/IncomingTcpConnection.java;h=02b40d19855f87bbe82151c2f33b92119e32003c;hp=eeb6b317bfbbac71e4c6d3e0a2253cd57922e707;hb=447fcef48630ea3f60a2c97f76910dbfa1a334f5;hpb=f3e24bd5162eede8ad13abc9c85c90dd971fc110
        Hide
        jbellis Jonathan Ellis added a comment -

        The rebuilding logic in v3 turns out to rely on the sign bit of the int being 1, so that when it gets and-ed with the long, it gets sign-extended:

        .       Long foo = 0xFFFFFFFFFFFFFFFFL;
                int bar = 0xF0000000;
                System.out.println(Long.toHexString(foo & bar));
                System.out.println(Long.toHexString(foo & 0));
        

        This outputs

        fffffffff0000000
        0
        

        Thus, when the high order bit of the lower 32 bits of currentTimeMillis is instead zero, then the and zeros out the entire high 32 of the long we were trying to rebuild with.

        pushed my suggested alternative above in 822ee88a38b3862d60b50748382ddf7957907cec, which does not rely on sign extension.

        Show
        jbellis Jonathan Ellis added a comment - The rebuilding logic in v3 turns out to rely on the sign bit of the int being 1, so that when it gets and-ed with the long, it gets sign-extended: . Long foo = 0xFFFFFFFFFFFFFFFFL; int bar = 0xF0000000; System .out.println( Long .toHexString(foo & bar)); System .out.println( Long .toHexString(foo & 0)); This outputs fffffffff0000000 0 Thus, when the high order bit of the lower 32 bits of currentTimeMillis is instead zero, then the and zeros out the entire high 32 of the long we were trying to rebuild with. pushed my suggested alternative above in 822ee88a38b3862d60b50748382ddf7957907cec, which does not rely on sign extension.
        Hide
        yukim Yuki Morishita added a comment -

        I think this is not working as expected.
        Truncate hangs on two node cluster using 1.2.0-beta1 binary because truncate response gets dropped every time(and this is why CliTest on trunk is failing today).

        I think casting long System.currentTimeMillis to int is fragile, and that's causing this line

        https://github.com/apache/cassandra/blob/cassandra-1.2.0-beta1/src/java/org/apache/cassandra/net/MessageDeliveryTask.java#L43

        always evaluates to true.
        When I tried, System.currentTimeMillis was like 1348691631776, but currentTime there was like 71900832.

        Show
        yukim Yuki Morishita added a comment - I think this is not working as expected. Truncate hangs on two node cluster using 1.2.0-beta1 binary because truncate response gets dropped every time(and this is why CliTest on trunk is failing today). I think casting long System.currentTimeMillis to int is fragile, and that's causing this line https://github.com/apache/cassandra/blob/cassandra-1.2.0-beta1/src/java/org/apache/cassandra/net/MessageDeliveryTask.java#L43 always evaluates to true. When I tried, System.currentTimeMillis was like 1348691631776, but currentTime there was like 71900832.
        Hide
        jbellis Jonathan Ellis added a comment -

        committed

        Show
        jbellis Jonathan Ellis added a comment - committed
        Hide
        jbellis Jonathan Ellis added a comment -

        Oops, didn't mean to include that in the patch. (CASSANDRA-4617 is open to fix that.)

        Show
        jbellis Jonathan Ellis added a comment - Oops, didn't mean to include that in the patch. ( CASSANDRA-4617 is open to fix that.)
        Hide
        vijay2win@yahoo.com Vijay added a comment -

        +1,
        Sorry for the delay spent a lot of time wondering why the test cases where failing (was looking at the wrong places)
        looks like thats because of the following setting in SP

        private static final boolean OPTIMIZE_LOCAL_REQUESTS = false;
        

        Once we set it to true we should be good to commit i guess.
        Thanks!

        Show
        vijay2win@yahoo.com Vijay added a comment - +1, Sorry for the delay spent a lot of time wondering why the test cases where failing (was looking at the wrong places) looks like thats because of the following setting in SP private static final boolean OPTIMIZE_LOCAL_REQUESTS = false ; Once we set it to true we should be good to commit i guess. Thanks!
        Hide
        jbellis Jonathan Ellis added a comment -

        v3 fixes ITC byte arithmetic per Vijay's suggestion.

        ({{ | (input.readInt() << 4 >> 4)}} also works, but Vijay's version is simpler.

        Show
        jbellis Jonathan Ellis added a comment - v3 fixes ITC byte arithmetic per Vijay's suggestion. ({{ | (input.readInt() << 4 >> 4)}} also works, but Vijay's version is simpler.
        Hide
        jbellis Jonathan Ellis added a comment -

        v2 only sends low-order timestamp int.

        Show
        jbellis Jonathan Ellis added a comment - v2 only sends low-order timestamp int.
        Hide
        vijay2win@yahoo.com Vijay added a comment -

        What did you have in mind?

        I was talking about the lower part of the long where we have 2 billion milliseconds (2147483.648 seconds) to count which will give us enough to timeout... varint cannot save those 4 bytes because there is value in it. Makes sense? I should have been clear earlier.

        Show
        vijay2win@yahoo.com Vijay added a comment - What did you have in mind? I was talking about the lower part of the long where we have 2 billion milliseconds (2147483.648 seconds) to count which will give us enough to timeout... varint cannot save those 4 bytes because there is value in it. Makes sense? I should have been clear earlier.
        Hide
        jbellis Jonathan Ellis added a comment -

        Maybe just sending the Integer for timeout instead of long will save us some bandwidth

        What did you have in mind? milliseconds gets too large for an int in less than a year. Even 100ths of seconds do. Cutting to 10ths of a second seems like we're losing too much resolution.

        Show
        jbellis Jonathan Ellis added a comment - Maybe just sending the Integer for timeout instead of long will save us some bandwidth What did you have in mind? milliseconds gets too large for an int in less than a year. Even 100ths of seconds do. Cutting to 10ths of a second seems like we're losing too much resolution.
        Hide
        vijay2win@yahoo.com Vijay added a comment -

        +1
        Side Note (Can be ignored): Maybe just sending the Integer for timeout instead of long will save us some bandwidth

        Show
        vijay2win@yahoo.com Vijay added a comment - +1 Side Note (Can be ignored): Maybe just sending the Integer for timeout instead of long will save us some bandwidth
        Hide
        jbellis Jonathan Ellis added a comment -

        The ExpiringMap of callbacks on the client side always waits until the RPC timeout to expire messages

        Not so, ResponseVerbHandler calls removeRegisteredCallback.

        A client thread calling get() on a callback has a timeout independent from the timeout for the Callback in the ExpiringMap

        That would be nice to clean up in a separate ticket.

        The originally mentioned issue of not having an accurate timeout on the server side

        This is the most important part to address. Patch attached.

        Show
        jbellis Jonathan Ellis added a comment - The ExpiringMap of callbacks on the client side always waits until the RPC timeout to expire messages Not so, ResponseVerbHandler calls removeRegisteredCallback. A client thread calling get() on a callback has a timeout independent from the timeout for the Callback in the ExpiringMap That would be nice to clean up in a separate ticket. The originally mentioned issue of not having an accurate timeout on the server side This is the most important part to address. Patch attached.
        Hide
        mw Melvin Wang added a comment -

        For 3rd bullet, would it be good enough to add a 'cancel' method to IAsyncCallback so that we it got expired in ExpiringMap, cancel will be called which will unblock the get() method and possibly throwing TimeoutException?

        Show
        mw Melvin Wang added a comment - For 3rd bullet, would it be good enough to add a 'cancel' method to IAsyncCallback so that we it got expired in ExpiringMap, cancel will be called which will unblock the get() method and possibly throwing TimeoutException?
        Hide
        mw Melvin Wang added a comment -

        Regarding to the first bullet point, why not introducing the creation time in the message header, so that whenever we need to know the remaining time we could just compare the current time with it?

        Show
        mw Melvin Wang added a comment - Regarding to the first bullet point, why not introducing the creation time in the message header, so that whenever we need to know the remaining time we could just compare the current time with it?
        Hide
        stuhood Stu Hood added a comment - - edited

        There are a few issues here:

        • The originally mentioned issue of not having an accurate timeout on the server side
          • Add a creation time on the client side? Set the remaining time immediately before serialization?
        • The ExpiringMap of callbacks on the client side always waits until the RPC timeout to expire messages
          • The Callback could eagerly expire itself from the map when it is is satisfied
        • A client thread calling get() on a callback has a timeout independent from the timeout for the Callback in the ExpiringMap: they can become disconnected
          • One way to solve that would be to change the IAsyncCallback interface to be more like a Future, such that it is notified when it expires, and could kill anyone blocking on get() with a TimeoutException
        Show
        stuhood Stu Hood added a comment - - edited There are a few issues here: The originally mentioned issue of not having an accurate timeout on the server side Add a creation time on the client side? Set the remaining time immediately before serialization? The ExpiringMap of callbacks on the client side always waits until the RPC timeout to expire messages The Callback could eagerly expire itself from the map when it is is satisfied A client thread calling get() on a callback has a timeout independent from the timeout for the Callback in the ExpiringMap: they can become disconnected One way to solve that would be to change the IAsyncCallback interface to be more like a Future, such that it is notified when it expires, and could kill anyone blocking on get() with a TimeoutException

          People

          • Assignee:
            jbellis Jonathan Ellis
            Reporter:
            kingryan Ryan King
            Reviewer:
            Vijay
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development