Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-12791

MessageIn logic to determine if the message is cross-node is wrong

    XMLWordPrintableJSON

Details

    • Low

    Description

      MessageIn has the following code to read the 'creation time' of the message on the receiving side:

      public static ConstructionTime readTimestamp(InetAddress from, DataInputPlus input, long timestamp) throws IOException
      {
          // make sure to readInt, even if cross_node_to is not enabled
          int partial = input.readInt();
          long crossNodeTimestamp = (timestamp & 0xFFFFFFFF00000000L) | (((partial & 0xFFFFFFFFL) << 2) >> 2);
          if (timestamp > crossNodeTimestamp)
          {
              MessagingService.instance().metrics.addTimeTaken(from, timestamp - crossNodeTimestamp);
          }
          if(DatabaseDescriptor.hasCrossNodeTimeout())
          {
              return new ConstructionTime(crossNodeTimestamp, timestamp != crossNodeTimestamp);
          }
          else
          {
              return new ConstructionTime();
          }
      }
      

      where timestamp is really the local time on the receiving node when calling that method.

      The incorrect part, I believe, is the timestamp != crossNodeTimestamp used to set the isCrossNode field of ConstructionTime. A first problem is that this will basically always be true: for it to be false, we'd need the low-bytes of the timestamp taken on the sending node to coincide exactly with the ones taken on the receiving side, which is very unlikely. It is also a relatively meaningless test: having that test be false basically means the lack of clock sync between the 2 nodes is exactly the time the 2 calls to System.currentTimeMillis() (on sender and receiver), which is definitively not what we care about.

      What the result of this test is used for is to determine if the message was crossNode or local. It's used to increment different metrics (we separate metric local versus crossNode dropped messages) in MessagingService for instance. And that's where this is kind of a bug: not only the timestamp != crossNodeTimestamp, but if DatabaseDescriptor.hasCrossNodeTimeout(), we always have this isCrossNode false, which means we'll never increment the "cross-node dropped messages" metric, which is imo unexpected.

      That is, it is true that if DatabaseDescriptor.hasCrossNodeTimeout() == false, then we end using the receiver side timestamp to timeout messages, and so you end up only dropping messages that timeout locally. And in that sense, always incrementing the "locally" dropped messages metric is not completely illogical. But I doubt most users are aware of those pretty specific nuance when looking at the related metrics, and I'm relatively sure users expect a metrics named droppedCrossNodeTimeout to actually count cross-node messages by default (keep in mind that DatabaseDescriptor.hasCrossNodeTimeout() is actually false by default).

      Anyway, to sum it up I suggest that the following change should be done:

      1. the timestamp != crossNodeTimestamp test is definitively not what we want. We should at a minimum just replace it to true as that's basically what it ends up being except for very rare and arguably random cases.
      2. given how the ConstructionTime.isCrossNode is used, I suggest that we really want it to mean if the message has shipped cross-node, not just be a synonymous of DatabaseDescriptor.hasCrossNodeTimeout(). It should be whether the message shipped cross-node, i.e. whether from == BroadcastAdress() or not.

      Attachments

        Activity

          People

            slebresne Sylvain Lebresne
            slebresne Sylvain Lebresne
            Sylvain Lebresne
            Stefania Alborghetti
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: