Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Fix Version/s: 2.0.8, 2.1 beta2
    • Component/s: None
    • Labels:
      None

      Description

      Large batches on a coordinator can cause a lot of node stress. I propose adding a WARN log entry if batch sizes go beyond a configurable size. This will give more visibility to operators on something that can happen on the developer side.

      New yaml setting with 5k default.

      # Log WARN on any batch size exceeding this value. 5k by default.
      # Caution should be taken on increasing the size of this threshold as it can lead to node instability.

      batch_size_warn_threshold: 5k

      1. 6487-cassandra-2.0.patch
        6 kB
        Lyuben Todorov
      2. 6487-cassandra-2.0_v2.patch
        7 kB
        Lyuben Todorov

        Issue Links

          Activity

          Hide
          martin.grotzke Martin Grotzke added a comment - - edited

          Thanks Patrick McFadin for the clarification! I saw you created CASSANDRA-10876 and also the related CASSANDRA-8825 - I wasn't aware of this ticket, good to see.

          Would you say the cost of a single partition single statement batch is exactly the same as a "normal" single statement?
          And how would you compare a single partition batch with multiple insert statements to just multiple insert statements in terms of server load / throughput - is executing multiple single partition statements as a batch a valid approach to increase throughput?

          Show
          martin.grotzke Martin Grotzke added a comment - - edited Thanks Patrick McFadin for the clarification! I saw you created CASSANDRA-10876 and also the related CASSANDRA-8825 - I wasn't aware of this ticket, good to see. Would you say the cost of a single partition single statement batch is exactly the same as a "normal" single statement? And how would you compare a single partition batch with multiple insert statements to just multiple insert statements in terms of server load / throughput - is executing multiple single partition statements as a batch a valid approach to increase throughput?
          Hide
          lorina@datastax.com Lorina Poland added a comment -

          There is also a DOC ticket. I'll have to look it up for you tomorrow. 
          Lorina
          Sent from my Verizon Wireless 4G LTE smartphone

          -------- Original message --------
          From: "Patrick McFadin (JIRA)" <jira@apache.org>
          Date: 12/15/2015 21:21 (GMT-08:00)
          To: lorina@datastax.com
          Subject: [jira] [Commented] (CASSANDRA-6487) Log WARN on large batch sizes

              [ https://issues.apache.org/jira/browse/CASSANDRA-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059511#comment-15059511 ]

          Patrick McFadin commented on CASSANDRA-6487:
          --------------------------------------------

          Martin Grotzke and Paolo Ragone Sorry! I just caught these comments.

          Valid points on a single partition and I think it warrants a change in the way we log WARN and ERROR for batches. The original intent was to prevent horrible anti-patterns on multi-partition batches. In the case of a single partition update, the impact is only in network payload size. Since there is no need for the coordinator to track all of the mutations across the batch partitions, the load is much less.

          I'll make an updated ticket to reflect that difference.

          Thanks for the comments and raising this issue.


          This message was sent by Atlassian JIRA
          (v6.3.4#6332)

          Show
          lorina@datastax.com Lorina Poland added a comment - There is also a DOC ticket. I'll have to look it up for you tomorrow.  Lorina Sent from my Verizon Wireless 4G LTE smartphone -------- Original message -------- From: "Patrick McFadin (JIRA)" <jira@apache.org> Date: 12/15/2015 21:21 (GMT-08:00) To: lorina@datastax.com Subject: [jira] [Commented] ( CASSANDRA-6487 ) Log WARN on large batch sizes     [ https://issues.apache.org/jira/browse/CASSANDRA-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059511#comment-15059511 ] Patrick McFadin commented on CASSANDRA-6487 : -------------------------------------------- Martin Grotzke and Paolo Ragone Sorry! I just caught these comments. Valid points on a single partition and I think it warrants a change in the way we log WARN and ERROR for batches. The original intent was to prevent horrible anti-patterns on multi-partition batches. In the case of a single partition update, the impact is only in network payload size. Since there is no need for the coordinator to track all of the mutations across the batch partitions, the load is much less. I'll make an updated ticket to reflect that difference. Thanks for the comments and raising this issue. – This message was sent by Atlassian JIRA (v6.3.4#6332)
          Hide
          pmcfadin Patrick McFadin added a comment -

          Martin Grotzke and Paolo Ragone Sorry! I just caught these comments.

          Valid points on a single partition and I think it warrants a change in the way we log WARN and ERROR for batches. The original intent was to prevent horrible anti-patterns on multi-partition batches. In the case of a single partition update, the impact is only in network payload size. Since there is no need for the coordinator to track all of the mutations across the batch partitions, the load is much less.

          I'll make an updated ticket to reflect that difference.

          Thanks for the comments and raising this issue.

          Show
          pmcfadin Patrick McFadin added a comment - Martin Grotzke and Paolo Ragone Sorry! I just caught these comments. Valid points on a single partition and I think it warrants a change in the way we log WARN and ERROR for batches. The original intent was to prevent horrible anti-patterns on multi-partition batches. In the case of a single partition update, the impact is only in network payload size. Since there is no need for the coordinator to track all of the mutations across the batch partitions, the load is much less. I'll make an updated ticket to reflect that difference. Thanks for the comments and raising this issue.
          Hide
          pragone Paolo Ragone added a comment -

          Hi, I wanted to echo the question from Martin Grotzke, we're doing a similar use in which we're saving an Aggregate as a list of "events" under the aggregate key. What this means is that the batch is always limited to 1 key, and therefore should be executed exclusively on one node and no intra-node communication/dependency should be needed.
          This is why I also think that this use should be safe.

          I understand that the warning is just a warning, but better understanding this can help shape better use of the Cassandra model.

          Show
          pragone Paolo Ragone added a comment - Hi, I wanted to echo the question from Martin Grotzke , we're doing a similar use in which we're saving an Aggregate as a list of "events" under the aggregate key. What this means is that the batch is always limited to 1 key, and therefore should be executed exclusively on one node and no intra-node communication/dependency should be needed. This is why I also think that this use should be safe. I understand that the warning is just a warning, but better understanding this can help shape better use of the Cassandra model.
          Hide
          martin.grotzke Martin Grotzke added a comment -

          Any feedback here? I'd like to understand the issues that are highlighted by the log warn in combination with 1) single-partition and/or 2) single-statement batches.

          Show
          martin.grotzke Martin Grotzke added a comment - Any feedback here? I'd like to understand the issues that are highlighted by the log warn in combination with 1) single-partition and/or 2) single-statement batches.
          Hide
          martin.grotzke Martin Grotzke added a comment - - edited

          Lyuben Todorov Can you please explain, why the batch size is relevant in both szenarios 1) and 2)?

          What are the extra costs of a single-partition batch (with multiple statements/inserts), so that this warning should be logged?
          How's a single-statement batch (obviously going to a single-partition) differently handled than a single-statement not sent as BATCH?

          Regarding single-partition batches, my understanding is that they don't cause any extra costs. This understanding is based e.g. on CASSANDRA-6737 ("A batch statements on a single partition should not create a new CF object for each update") and on http://christopher-batey.blogspot.de/2015/02/cassandra-anti-pattern-misuse-of.html, which says (in the paragraph "So when should you use unlogged batches?")

          Well customer id is the partition key, so this will be no more coordination work than a single insert and it can be done with a single operation at the storage layer.


          What's wrong with this understanding, in which way are single-partition batches more expensive?

          Show
          martin.grotzke Martin Grotzke added a comment - - edited Lyuben Todorov Can you please explain, why the batch size is relevant in both szenarios 1) and 2)? What are the extra costs of a single-partition batch (with multiple statements/inserts), so that this warning should be logged? How's a single-statement batch (obviously going to a single-partition) differently handled than a single-statement not sent as BATCH? Regarding single-partition batches, my understanding is that they don't cause any extra costs. This understanding is based e.g. on CASSANDRA-6737 ("A batch statements on a single partition should not create a new CF object for each update") and on http://christopher-batey.blogspot.de/2015/02/cassandra-anti-pattern-misuse-of.html , which says (in the paragraph "So when should you use unlogged batches?") Well customer id is the partition key, so this will be no more coordination work than a single insert and it can be done with a single operation at the storage layer. What's wrong with this understanding, in which way are single-partition batches more expensive?
          Hide
          lyubent Lyuben Todorov added a comment -

          Martin Grotzke Yes the batch size is relevant in both scenario 1 and 2. Remember it's a warning, you can tweek the settings to see increase the batch size using the setting batch_size_warn_threshold_in_kb. Also notice that the fail threshold is 10x the warn threshold by default (batch_size_fail_threshold_in_kb: 50).

          Show
          lyubent Lyuben Todorov added a comment - Martin Grotzke Yes the batch size is relevant in both scenario 1 and 2. Remember it's a warning, you can tweek the settings to see increase the batch size using the setting batch_size_warn_threshold_in_kb . Also notice that the fail threshold is 10x the warn threshold by default ( batch_size_fail_threshold_in_kb: 50 ).
          Hide
          martin.grotzke Martin Grotzke added a comment -

          Is the batch size (5kB) also relevant, if
          1) a batch writes only a single partition
          2) or it contains only a single statement/insert?

          Background: We're using akka-persistence-cassandra (which writes single events as batch as well AFAICS) and get warnings for ap_messages like WARN [SharedPool-Worker-78] 2015-11-11 17:30:07,489 BatchStatement.java:252 - Batch of prepared statements for [search.ap_messages] is of size 10243, exceeding specified threshold of 5120 by 5123..
          Therefore I'd like to better understand this issue to know better how we should proceed.

          Show
          martin.grotzke Martin Grotzke added a comment - Is the batch size (5kB) also relevant, if 1) a batch writes only a single partition 2) or it contains only a single statement/insert? Background: We're using akka-persistence-cassandra (which writes single events as batch as well AFAICS) and get warnings for ap_messages like WARN [SharedPool-Worker-78] 2015-11-11 17:30:07,489 BatchStatement.java:252 - Batch of prepared statements for [search.ap_messages] is of size 10243, exceeding specified threshold of 5120 by 5123. . Therefore I'd like to better understand this issue to know better how we should proceed.
          Hide
          cowardlydragon Constance Eustace added a comment -

          What happens if a single-statement BATCH exceeds the limit?

          I ask this because the batch size limit will impact setting the timestamp on a statement. If we have a collection of updates, the decision to batch or not batch them happens further downstream, when a collection of statements are analyzed.

          HOWEVER, the UPDATE statement only supports the USING timestamp in the middle of the statement.

          The BATCH statement allows you to make the timestamp decision later on.

          If a BATCH is encountered with a SINGLE STATEMENT, can the limit be ignored and have it be treated as a normal update?

          I ask because there is discussion of making this a hard limit.

          Show
          cowardlydragon Constance Eustace added a comment - What happens if a single-statement BATCH exceeds the limit? I ask this because the batch size limit will impact setting the timestamp on a statement. If we have a collection of updates, the decision to batch or not batch them happens further downstream, when a collection of statements are analyzed. HOWEVER, the UPDATE statement only supports the USING timestamp in the middle of the statement. The BATCH statement allows you to make the timestamp decision later on. If a BATCH is encountered with a SINGLE STATEMENT, can the limit be ignored and have it be treated as a normal update? I ask because there is discussion of making this a hard limit.
          Hide
          alprema Kévin LOVATO added a comment -

          We experienced some problems when using (ridiculously) big batches here, Cassandra was throwing odd exceptions:

          java.lang.IndexOutOfBoundsException: Invalid combined index of 1565817280, maximum is 482109
                  at org.jboss.netty.buffer.SlicedChannelBuffer.<init>(SlicedChannelBuffer.java:46)
                  at org.jboss.netty.buffer.HeapChannelBuffer.slice(HeapChannelBuffer.java:201)
                  at org.jboss.netty.buffer.AbstractChannelBuffer.readSlice(AbstractChannelBuffer.java:323)
                  at org.apache.cassandra.transport.CBUtil.readValue(CBUtil.java:295)
                  at org.apache.cassandra.transport.CBUtil.readValueList(CBUtil.java:340)
                  at org.apache.cassandra.transport.messages.BatchMessage$1.decode(BatchMessage.java:62)
                  at org.apache.cassandra.transport.messages.BatchMessage$1.decode(BatchMessage.java:43)
                  at org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:212)
                  at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
                  at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:68)
                  at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
                  at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
                  at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
                  at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
                  at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
                  at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
                  at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
                  at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
                  at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
                  at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
                  at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)
          

          We fixed our code so it behaved and stopped sending huge batches, but since it appears that Cassandra can't handle too big batches, wouldn't it make more sense to simply refuse them instead of warning / crashing ?

          Show
          alprema Kévin LOVATO added a comment - We experienced some problems when using (ridiculously) big batches here, Cassandra was throwing odd exceptions: java.lang.IndexOutOfBoundsException: Invalid combined index of 1565817280, maximum is 482109 at org.jboss.netty.buffer.SlicedChannelBuffer.<init>(SlicedChannelBuffer.java:46) at org.jboss.netty.buffer.HeapChannelBuffer.slice(HeapChannelBuffer.java:201) at org.jboss.netty.buffer.AbstractChannelBuffer.readSlice(AbstractChannelBuffer.java:323) at org.apache.cassandra.transport.CBUtil.readValue(CBUtil.java:295) at org.apache.cassandra.transport.CBUtil.readValueList(CBUtil.java:340) at org.apache.cassandra.transport.messages.BatchMessage$1.decode(BatchMessage.java:62) at org.apache.cassandra.transport.messages.BatchMessage$1.decode(BatchMessage.java:43) at org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:212) at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:68) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source) We fixed our code so it behaved and stopped sending huge batches, but since it appears that Cassandra can't handle too big batches, wouldn't it make more sense to simply refuse them instead of warning / crashing ?
          Hide
          jtravis Jon Travis added a comment -

          I'm batching on a single partition only.
          I have a table defined as:
          CREATE TABLE store.blobs (
          account_name text,
          m_guid text,
          m_blob text,
          PRIMARY KEY (account_name, m_guid))

          I am using a prepared statement with an unlogged batch to insert many blobs into the same account at all once:
          INSERT INTO blobs (account_name, m_guid, m_blob) VALUES (?, ?, ?)

          My understanding is that this is a pretty decent way of doing it: http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
          (re Batching Prepared Statements).

          I could do these all individually, but there would clearly be some overhead.

          So, the options are to not use the prepared statement / batch, jack up the threshold, or change the Cassandra code to avoid logging on unlogged batches..

          Show
          jtravis Jon Travis added a comment - I'm batching on a single partition only. I have a table defined as: CREATE TABLE store.blobs ( account_name text, m_guid text, m_blob text, PRIMARY KEY (account_name, m_guid)) I am using a prepared statement with an unlogged batch to insert many blobs into the same account at all once: INSERT INTO blobs (account_name, m_guid, m_blob) VALUES (?, ?, ?) My understanding is that this is a pretty decent way of doing it: http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0 (re Batching Prepared Statements). I could do these all individually, but there would clearly be some overhead. So, the options are to not use the prepared statement / batch, jack up the threshold, or change the Cassandra code to avoid logging on unlogged batches..
          Hide
          jbellis Jonathan Ellis added a comment -

          There's essentially no performance gain from batching across multiple partitions.

          Show
          jbellis Jonathan Ellis added a comment - There's essentially no performance gain from batching across multiple partitions.
          Hide
          jtravis Jon Travis added a comment -

          Does this make sense to log even in the case where the batch is Type.UNLOGGED? When writing as fast as I can, it sounds like a BatchStatement with a single execution is the fastest thing to do. However, I'm now getting these warnings. My options are to jack up the batch size threshold, or stop using batch statements..

          Show
          jtravis Jon Travis added a comment - Does this make sense to log even in the case where the batch is Type.UNLOGGED? When writing as fast as I can, it sounds like a BatchStatement with a single execution is the fastest thing to do. However, I'm now getting these warnings. My options are to jack up the batch size threshold, or stop using batch statements..
          Hide
          jbellis Jonathan Ellis added a comment -

          committed

          Show
          jbellis Jonathan Ellis added a comment - committed
          Hide
          benedict Benedict added a comment -

          Good point (can have multiple partitions per CF). LGTM.

          +1

          Show
          benedict Benedict added a comment - Good point (can have multiple partitions per CF). LGTM. +1
          Hide
          lyubent Lyuben Todorov added a comment -

          v2 uses Iterables.concat(Iterables.transfrom(...) to build an Iterable of CFs from IMutations, but I kept the hashset, using an ArrayList does yield duplicate values.

          Show
          lyubent Lyuben Todorov added a comment - v2 uses Iterables.concat(Iterables.transfrom(...) to build an Iterable of CFs from IMutations, but I kept the hashset, using an ArrayList does yield duplicate values.
          Hide
          benedict Benedict added a comment -

          Basic approach looks good. I would:

          1. use Iterables.concat(Iterables.transform()) to create set of items to process (this requires changing signature to accept an Iterable instead of Collection)
          2. Only construct the set of affected KS/CFs in the case that the size warn limit is breached. You can just use an ArrayList as well, as uniqueness is guaranteed (see getMutations(), which merges mutations for the same CF into one IMutation. Feel free to add a comment to this method explaining this so the next person to read this doesn't have to figure it out)
          Show
          benedict Benedict added a comment - Basic approach looks good. I would: use Iterables.concat(Iterables.transform()) to create set of items to process (this requires changing signature to accept an Iterable instead of Collection) Only construct the set of affected KS/CFs in the case that the size warn limit is breached. You can just use an ArrayList as well, as uniqueness is guaranteed (see getMutations(), which merges mutations for the same CF into one IMutation. Feel free to add a comment to this method explaining this so the next person to read this doesn't have to figure it out)
          Hide
          lyubent Lyuben Todorov added a comment -

          Patch is back to using the 5kb limit and checks batch size via the method suggested by Aleksey (Checking column size based on the mutations that are about to be applied)

          Show
          lyubent Lyuben Todorov added a comment - Patch is back to using the 5kb limit and checks batch size via the method suggested by Aleksey (Checking column size based on the mutations that are about to be applied)
          Hide
          benedict Benedict added a comment -

          I suggest using the ColumnFamily.dataSize() method as Aleksey suggested: in the BatchStatement.executeWithConditions() and executeWithoutConditions() methods we have access to the fully constructed ColumnFamily objects we will apply. In the former we construct a single CF updates, and in the latter we can iterate over each of the IMutations and call getColumnFamilies().

          Warning on the prepared size is probably not meaningful, because it does not say anything about how big the data we're applying is.

          Show
          benedict Benedict added a comment - I suggest using the ColumnFamily.dataSize() method as Aleksey suggested: in the BatchStatement.executeWithConditions() and executeWithoutConditions() methods we have access to the fully constructed ColumnFamily objects we will apply. In the former we construct a single CF updates , and in the latter we can iterate over each of the IMutations and call getColumnFamilies() . Warning on the prepared size is probably not meaningful, because it does not say anything about how big the data we're applying is.
          Hide
          lyubent Lyuben Todorov added a comment - - edited

          Just noticed that we're actually already using the memory meter for checking batch size when it might get placed into the prepared statement cache, so why not log based on that value (calculated in BatchStatement#measureForPreparedCache). As for non-prepared batch statements, there we can enforce a limit based on count of statements.

          Show
          lyubent Lyuben Todorov added a comment - - edited Just noticed that we're actually already using the memory meter for checking batch size when it might get placed into the prepared statement cache, so why not log based on that value (calculated in BatchStatement#measureForPreparedCache ). As for non-prepared batch statements, there we can enforce a limit based on count of statements.
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          Anyway, I'm not saying that this is the way to go - merely listing options.

          Show
          iamaleksey Aleksey Yeschenko added a comment - Anyway, I'm not saying that this is the way to go - merely listing options.
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          No, that's not what I meant. I meant the size of the resulting Mutation-s (RowMutation-s pre 2.1), as a sum of ColumnFamily#dataSize()-s for each of the Mutation#getColumnFamilies(). Of course it would affect the path - any extra stuff you do would.

          Show
          iamaleksey Aleksey Yeschenko added a comment - No, that's not what I meant. I meant the size of the resulting Mutation-s (RowMutation-s pre 2.1), as a sum of ColumnFamily#dataSize()-s for each of the Mutation#getColumnFamilies(). Of course it would affect the path - any extra stuff you do would.
          Hide
          lyubent Lyuben Todorov added a comment -

          Aleksey Yeschenko I assume you mean calling ByteBuffer#limit in BatchStatement#executeWithPerStatementVariables, I like the idea, it will be much more accurate than just counting queries and it's just a loop with a counter, and shouldn't hurt the fast path, right? /cc Benedict.

          Maybe count of batch size warnings, largest batch size seen, most recent batch size over the limit.

          Jack Krupansky +1, maybe also something like total statement count over the limit (e.g. if a batch exceeds the limit by 10, and this occurs 4 times, that metric will end up with 40).

          Show
          lyubent Lyuben Todorov added a comment - Aleksey Yeschenko I assume you mean calling ByteBuffer#limit in BatchStatement#executeWithPerStatementVariables , I like the idea, it will be much more accurate than just counting queries and it's just a loop with a counter, and shouldn't hurt the fast path, right? /cc Benedict . Maybe count of batch size warnings, largest batch size seen, most recent batch size over the limit. Jack Krupansky +1, maybe also something like total statement count over the limit (e.g. if a batch exceeds the limit by 10, and this occurs 4 times, that metric will end up with 40).
          Hide
          jkrupan Jack Krupansky added a comment -

          Is this something important enough that an Ops team might want to monitor in an automated manner, like with an mbean, for OpsCenter and other monitoring tools? Maybe count of batch size warnings, largest batch size seen, most recent batch size over the limit.

          Show
          jkrupan Jack Krupansky added a comment - Is this something important enough that an Ops team might want to monitor in an automated manner, like with an mbean, for OpsCenter and other monitoring tools? Maybe count of batch size warnings, largest batch size seen, most recent batch size over the limit.
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          Not saying that we should, but we can calculate the size of the resulting processed collection of Mutation-s w/out using reflection, and warn based on that.

          Show
          iamaleksey Aleksey Yeschenko added a comment - Not saying that we should, but we can calculate the size of the resulting processed collection of Mutation-s w/out using reflection, and warn based on that.
          Hide
          lyubent Lyuben Todorov added a comment -

          We can count statements but the problem there is that if a batch has identical statements grouped together, then counting statements means we are just guessing whether the batch is large or not. Example:

          // 10 statements in a batch when measured with memory meter creates a batch of size 81912 bytes.
          // for a table with schema "CREATE TABLE tbl (col1 text PRIMARY KEY);"
          PreparedStatement prepStatement = session.prepare("INSERT INTO db.tbl (col1) VALUES (?)");
          BatchStatement batch = new BatchStatement();
          batch.add(prepStatement.bind("val1"));
          ...
          batch.add(prepStatement.bind("val10"));
          

          Increasing it to 100 produces a batch of size 82456, 544 bytes more for 10x statements. That being said, this is for batches where queries are identical, and we are only displaying a warning, not actually restricting such batches, so I'll attach the patch with the updated message style suggest by Jonathan Ellis where the default is set to 50 statements per batch (based on twissandra's model below it allows for 16 posts in a batch where the batch of 48 queries is of size 89kb which seems reasonable.

          INSERT INTO tweets (tweet_id, username, body) VALUES (8ffb88b3-ae60-48c2-bb96-c4f2d08c4ceb, 'lyubent', 'epic msg');
          INSERT INTO userline (username, tweet_id, time) VALUES ('lyubent', 8ffb88b3-ae60-48c2-bb96-c4f2d08c4ceb, 66bd94a0-c17d-11e3-9c7a-4366e868fc79);
          INSERT INTO timeline (username, tweet_id, time) VALUES ('follower', 8ffb88b3-ae60-48c2-bb96-c4f2d08c4ceb, 66bd94a0-c17d-11e3-9c7a-4366e868fc79);
          

          p.s. Example warn output:

          WARN 17:28:19,652 Batch of statements for [db.timeline, db.userline, db.tweets] is of size 51, exceeding specified threshold of 50 by 1.
          
          Show
          lyubent Lyuben Todorov added a comment - We can count statements but the problem there is that if a batch has identical statements grouped together, then counting statements means we are just guessing whether the batch is large or not. Example: // 10 statements in a batch when measured with memory meter creates a batch of size 81912 bytes. // for a table with schema "CREATE TABLE tbl (col1 text PRIMARY KEY);" PreparedStatement prepStatement = session.prepare("INSERT INTO db.tbl (col1) VALUES (?)"); BatchStatement batch = new BatchStatement(); batch.add(prepStatement.bind("val1")); ... batch.add(prepStatement.bind("val10")); Increasing it to 100 produces a batch of size 82456, 544 bytes more for 10x statements. That being said, this is for batches where queries are identical, and we are only displaying a warning, not actually restricting such batches, so I'll attach the patch with the updated message style suggest by Jonathan Ellis where the default is set to 50 statements per batch (based on twissandra's model below it allows for 16 posts in a batch where the batch of 48 queries is of size 89kb which seems reasonable. INSERT INTO tweets (tweet_id, username, body) VALUES (8ffb88b3-ae60-48c2-bb96-c4f2d08c4ceb, 'lyubent', 'epic msg'); INSERT INTO userline (username, tweet_id, time) VALUES ('lyubent', 8ffb88b3-ae60-48c2-bb96-c4f2d08c4ceb, 66bd94a0-c17d-11e3-9c7a-4366e868fc79); INSERT INTO timeline (username, tweet_id, time) VALUES ('follower', 8ffb88b3-ae60-48c2-bb96-c4f2d08c4ceb, 66bd94a0-c17d-11e3-9c7a-4366e868fc79); p.s. Example warn output: WARN 17:28:19,652 Batch of statements for [db.timeline, db.userline, db.tweets] is of size 51, exceeding specified threshold of 50 by 1.
          Hide
          jbellis Jonathan Ellis added a comment -

          Okay, then the first patch is actually what we want for that.

          Problem is, we can't compute size-in-bytes without MemoryMeter, which is reflection-based so I wouldn't want to put it in the fast path.

          If you're okay with counting statements instead I think that will be more lightweight.

          Show
          jbellis Jonathan Ellis added a comment - Okay, then the first patch is actually what we want for that. Problem is, we can't compute size-in-bytes without MemoryMeter, which is reflection-based so I wouldn't want to put it in the fast path. If you're okay with counting statements instead I think that will be more lightweight.
          Hide
          pmcfadin Patrick McFadin added a comment -

          Yes that was in bytes. Just in my own experience, I don't recommend more than ~100 mutations per batch. Doing some quick math I came up with 5k as 100 x 50 byte mutations.

          Totally up for debate.

          Show
          pmcfadin Patrick McFadin added a comment - Yes that was in bytes. Just in my own experience, I don't recommend more than ~100 mutations per batch. Doing some quick math I came up with 5k as 100 x 50 byte mutations. Totally up for debate.
          Hide
          jbellis Jonathan Ellis added a comment -

          Oops, I skimmed too fast and thought we were counting statements not bytes. Is that what you were thinking when you estimated 5k Patrick McFadin?

          Show
          jbellis Jonathan Ellis added a comment - Oops, I skimmed too fast and thought we were counting statements not bytes. Is that what you were thinking when you estimated 5k Patrick McFadin ?
          Hide
          lyubent Lyuben Todorov added a comment -

          Sure thing, changed from kb to bytes and updated the warning message in v2.

          Show
          lyubent Lyuben Todorov added a comment - Sure thing, changed from kb to bytes and updated the warning message in v2.
          Hide
          jbellis Jonathan Ellis added a comment -

          Can you make it configure by 1s instead of 1000s?

          Bikeshed: would prefer format of

          Batch of statements for [test.cf, test.cf2, test2.cf] is of size 11024, exceeding specified threshold of 7168
          
          Show
          jbellis Jonathan Ellis added a comment - Can you make it configure by 1s instead of 1000s? Bikeshed: would prefer format of Batch of statements for [test.cf, test.cf2, test2.cf] is of size 11024, exceeding specified threshold of 7168
          Hide
          lyubent Lyuben Todorov added a comment -

          Added the batch_size_warn_threshold setting to cassandra.yaml and altered QueryProcessor#processBatch to log a WARN is the batch's size is more than the added setting.

          Some example output of what the warning looks like (keep in mind that multiple kss/cfs can be updated with one batch)

          WARN  14:29:12 Batch of statements for ks.cf pairs to be updated [test.cf, test.cf2, test2.cf] is of size 11024 and exceeds specified threshold of 7168.
          WARN  14:31:59 Batch of statements for ks.cf pairs to be updated [test.cf] is of size 1448 and exceeds specified threshold of 1024.
          
          Show
          lyubent Lyuben Todorov added a comment - Added the batch_size_warn_threshold setting to cassandra.yaml and altered QueryProcessor#processBatch to log a WARN is the batch's size is more than the added setting. Some example output of what the warning looks like (keep in mind that multiple kss/cfs can be updated with one batch) WARN 14:29:12 Batch of statements for ks.cf pairs to be updated [test.cf, test.cf2, test2.cf] is of size 11024 and exceeds specified threshold of 7168. WARN 14:31:59 Batch of statements for ks.cf pairs to be updated [test.cf] is of size 1448 and exceeds specified threshold of 1024.
          Hide
          pmcfadin Patrick McFadin added a comment -

          Sure. Can't see any reason not to add more info if it's easy to add.

          Show
          pmcfadin Patrick McFadin added a comment - Sure. Can't see any reason not to add more info if it's easy to add.
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          If it's not out of the way, it would help to include the keyspace and column family and maybe the session ID/info.

          Show
          atobey@datastax.com Albert P Tobey added a comment - If it's not out of the way, it would help to include the keyspace and column family and maybe the session ID/info.

            People

            • Assignee:
              lyubent Lyuben Todorov
              Reporter:
              pmcfadin Patrick McFadin
              Reviewer:
              Benedict
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development