Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 1.1.0
    • Component/s: API, Core
    • Labels:

      Description

      Perform benchmarks to compare the performance of string and pre-serialized binary parameters to prepared statements.

        Issue Links

          Activity

          Hide
          Eric Evans added a comment -

          v1-0001-CASSANDRA-3634-generated-thrift-code.txt and v1-0002-change-bind-parms-from-string-to-bytes.txt convert string bind params to binary for purposes of performance testing.

          Show
          Eric Evans added a comment - v1-0001- CASSANDRA-3634 -generated-thrift-code.txt and v1-0002-change-bind-parms-from-string-to-bytes.txt convert string bind params to binary for purposes of performance testing.
          Hide
          Eric Evans added a comment -

          stress-change-bind-parms-to-BB.patch updates stress to use binary query parameters for prepared statements.

          This patch only updates the operations used in testing, (it would need more work before committing).

          Show
          Eric Evans added a comment - stress-change-bind-parms-to-BB.patch updates stress to use binary query parameters for prepared statements. This patch only updates the operations used in testing, (it would need more work before committing).
          Hide
          Eric Evans added a comment -

          Here is the performance comparison. I stuck to the same tests I performed earlier (those earlier results can be found here). The patches to support binary query parameters for Cassandra and stress are attached to this issue, and the raw results can be found here.

          Note: Percentages listed are in relation to RPC performance.

          Inserts, 20M rows x 5 columns

            Average OP rate Average Latency
          RPC 23,681/s 1.1ms
          CQL 21,128/s (-11%) 1.3ms (+11%)
          CQL w/ Prepared statements 23,911/s 1.1ms
          CQL w/ Prepared statements (binary parms) 24,919/s (+5%) 1.2ms (+5%)

          Inserts, 10M rows x 5 columns, KEYS index

            Average OP rate Average Latency
          RPC 10,054/s 5ms
          CQL 9,326/s (-7%) 5.4ms (+8%)
          CQL w/ Prepared statements 10,413/s (+3%) 4.8ms (-3%)
          CQL w/ Prepared statements (binary parms) 10,299/s (+2%) 5ms

          Counter increments, 10M rows x 5 columns

            Average OP rate Average Latency
          RPC 22,075/s 1.2ms
          CQL 20,645/s (-6%) 1.2ms (+2%)
          CQL w/ Prepared statements 24,286/s (+9%) 1.2ms (-1%)
          CQL w/ Prepared statements (binary parms) 23,359/s (+5%) 1.2ms

          Reads, 20M rows x 5 columns

            Average OP rate Average Latency
          RPC 22,285/s 2.1ms
          CQL 20,080/s (-10%) 2.3ms (+9%)
          CQL w/ Prepared statements 22,374/s 2.1ms (-1%)
          CQL w/ Prepared statements (binary parms) 22,176/s 2.1ms
          Show
          Eric Evans added a comment - Here is the performance comparison. I stuck to the same tests I performed earlier (those earlier results can be found here ). The patches to support binary query parameters for Cassandra and stress are attached to this issue, and the raw results can be found here . Note: Percentages listed are in relation to RPC performance. Inserts, 20M rows x 5 columns   Average OP rate Average Latency RPC 23,681/s 1.1ms CQL 21,128/s (-11%) 1.3ms (+11%) CQL w/ Prepared statements 23,911/s 1.1ms CQL w/ Prepared statements (binary parms) 24,919/s (+5%) 1.2ms (+5%) Inserts, 10M rows x 5 columns, KEYS index   Average OP rate Average Latency RPC 10,054/s 5ms CQL 9,326/s (-7%) 5.4ms (+8%) CQL w/ Prepared statements 10,413/s (+3%) 4.8ms (-3%) CQL w/ Prepared statements (binary parms) 10,299/s (+2%) 5ms Counter increments, 10M rows x 5 columns   Average OP rate Average Latency RPC 22,075/s 1.2ms CQL 20,645/s (-6%) 1.2ms (+2%) CQL w/ Prepared statements 24,286/s (+9%) 1.2ms (-1%) CQL w/ Prepared statements (binary parms) 23,359/s (+5%) 1.2ms Reads, 20M rows x 5 columns   Average OP rate Average Latency RPC 22,285/s 2.1ms CQL 20,080/s (-10%) 2.3ms (+9%) CQL w/ Prepared statements 22,374/s 2.1ms (-1%) CQL w/ Prepared statements (binary parms) 22,176/s 2.1ms
          Hide
          Rick Shaw added a comment -

          +1

          Looks like "Strings" wins in terms of performance. It offers the most flexibility in transformation as well. I think we have a winner.

          Show
          Rick Shaw added a comment - +1 Looks like "Strings" wins in terms of performance. It offers the most flexibility in transformation as well. I think we have a winner.
          Hide
          Eric Evans added a comment -

          At Brandon's suggestion, I'm rerunning the insert test with some higher column counts. That should make any per-term performance costs/savings more obvious. I'll post those results when I have them.

          Show
          Eric Evans added a comment - At Brandon's suggestion, I'm rerunning the insert test with some higher column counts. That should make any per-term performance costs/savings more obvious. I'll post those results when I have them.
          Hide
          Jonathan Ellis added a comment -

          Is the server om a separate machine from the client here?

          Show
          Jonathan Ellis added a comment - Is the server om a separate machine from the client here?
          Hide
          Eric Evans added a comment -

          No, it's not

          Show
          Eric Evans added a comment - No, it's not
          Hide
          Jonathan Ellis added a comment -

          Let's get Brandon to do some testing on our cluster with separate clients and servers. If strings are testing faster than binary then either

          1. something is wrong with the code, because parsing String -> ByteBuffer can't possibly be faster than just using the ByteBuffer from Thrift (not to mention that Thrift's internal creation of the String object has more overhead than marking a ByteBuffer slice of the frame)
          2. the difference is negligible compared to other factors and the test noise
          3. the difference is hidden by environmental factors, e.g., String runs just as fast as BB but with X% more CPU used

          Splitting out clients/servers will help determine if #3 is playing a role here.

          Show
          Jonathan Ellis added a comment - Let's get Brandon to do some testing on our cluster with separate clients and servers. If strings are testing faster than binary then either something is wrong with the code, because parsing String -> ByteBuffer can't possibly be faster than just using the ByteBuffer from Thrift (not to mention that Thrift's internal creation of the String object has more overhead than marking a ByteBuffer slice of the frame) the difference is negligible compared to other factors and the test noise the difference is hidden by environmental factors, e.g., String runs just as fast as BB but with X% more CPU used Splitting out clients/servers will help determine if #3 is playing a role here.
          Hide
          Eric Evans added a comment -

          Let's get Brandon to do some testing on our cluster with separate clients and servers.

          I've started another run with one client, one server, but having another set of results would be great, thanks.

          If strings are testing faster than binary then either

          I don't think strings are faster. The differences between String and BB for the insert-with-index, counter-increment, and read tests are very small, (i.e. we're looking at #2 here).

          I believe the reason the insert-with-index and counter increment tests would fail to show a significant difference is because those operations push the contention needle in a different direction. This is consistent with the fact that vanilla CQL fares so much better against thrift on these tests. Also, the counter increment test only has one bind var (the key), so I would expect the difference to be quite small.

          Like the counter increment test, the read test also only has one bind var (again the key), so I would likewise expect any difference to be lost in the noise.

          The insert test above does indicate some difference, BB beats out Strings by 4% (5 columns equates to 11 bind vars here). I don't have the results in front of me, but I ran a new insert test last night and I believe this increases to about 10% at 10x the number of columns.

          The choice of initial tests here were based on what I had run earlier (to make the obligatory before/after comparison). Those are still relevant I think, but to get a better idea of BB vs String, we need some results for insert tests with higher column counts (w/ client and server split to rule out #3 above).

          Show
          Eric Evans added a comment - Let's get Brandon to do some testing on our cluster with separate clients and servers. I've started another run with one client, one server, but having another set of results would be great, thanks. If strings are testing faster than binary then either I don't think strings are faster. The differences between String and BB for the insert-with-index, counter-increment, and read tests are very small, (i.e. we're looking at #2 here). I believe the reason the insert-with-index and counter increment tests would fail to show a significant difference is because those operations push the contention needle in a different direction. This is consistent with the fact that vanilla CQL fares so much better against thrift on these tests. Also, the counter increment test only has one bind var (the key), so I would expect the difference to be quite small. Like the counter increment test, the read test also only has one bind var (again the key), so I would likewise expect any difference to be lost in the noise. The insert test above does indicate some difference, BB beats out Strings by 4% (5 columns equates to 11 bind vars here). I don't have the results in front of me, but I ran a new insert test last night and I believe this increases to about 10% at 10x the number of columns. The choice of initial tests here were based on what I had run earlier (to make the obligatory before/after comparison). Those are still relevant I think, but to get a better idea of BB vs String, we need some results for insert tests with higher column counts (w/ client and server split to rule out #3 above).
          Hide
          Eric Evans added a comment -

          OK, just for posterity sake, here are the results from that 50 column insert test. It's less than the 10% I mentioned, but I'm not sure I trust that since the BB-based test seems to taper off toward the end for reasons unknown.

          BB is +9.4% throughput at the half-way point.

            Average OP rate Average Latency
          CQL w/ Prepared statements 7,137/s 5.7ms
          CQL w/ Prepared statements (binary parms) 7,257/s 5.7ms
          Show
          Eric Evans added a comment - OK, just for posterity sake, here are the results from that 50 column insert test. It's less than the 10% I mentioned, but I'm not sure I trust that since the BB-based test seems to taper off toward the end for reasons unknown. BB is +9.4% throughput at the half-way point.   Average OP rate Average Latency CQL w/ Prepared statements 7,137/s 5.7ms CQL w/ Prepared statements (binary parms) 7,257/s 5.7ms
          Hide
          Eric Evans added a comment - - edited

          To run these tests you need:

          1. https://github.com/eevans/cassandra/tree/3633.stress – updates stress for prepared statements with String-typed args
          2. https://github.com/eevans/cassandra/tree/3634.bb – updates Cassandra to use ByteBuffer-typed prepared statement args
          3. https://github.com/eevans/cassandra/tree/3634.stress.bb – updates stress to use ByteBuffer args with prepared statements

          Use branch #1 to test String arguments, and branches #2 and 3 when testing ByteBuffer arguments.

          This recipe should work:

          $ git clone git://github.com/eevans/cassandra.git
          $ git co -b without_bb origin/3633.stress
          $ git co -b with_bb origin/3634.bb
          $ git merge origin/3634.stress.bb
          
          Show
          Eric Evans added a comment - - edited To run these tests you need: https://github.com/eevans/cassandra/tree/3633.stress – updates stress for prepared statements with String-typed args https://github.com/eevans/cassandra/tree/3634.bb – updates Cassandra to use ByteBuffer-typed prepared statement args https://github.com/eevans/cassandra/tree/3634.stress.bb – updates stress to use ByteBuffer args with prepared statements Use branch #1 to test String arguments, and branches #2 and 3 when testing ByteBuffer arguments. This recipe should work: $ git clone git: //github.com/eevans/cassandra.git $ git co -b without_bb origin/3633.stress $ git co -b with_bb origin/3634.bb $ git merge origin/3634.stress.bb
          Hide
          Eric Evans added a comment -

          I've re-run the tests with separate client and server, and included some new tests that should better expose the per-term processing costs. Instead of ballooning this ticket with more inline graphs and tables, I'll just leave the link to my Google spreadsheet:

          https://docs.google.com/spreadsheet/ccc?key=0AmAl6Pxmv6AndGxwMnFFaWtMOFVIUkdpbzVGYkptOWc

          Raw results and plots are here: http://people.apache.org/~eevans/3634-1/

          I look forward to comparing these with the tests Brandon is running.

          Show
          Eric Evans added a comment - I've re-run the tests with separate client and server, and included some new tests that should better expose the per-term processing costs. Instead of ballooning this ticket with more inline graphs and tables, I'll just leave the link to my Google spreadsheet: https://docs.google.com/spreadsheet/ccc?key=0AmAl6Pxmv6AndGxwMnFFaWtMOFVIUkdpbzVGYkptOWc Raw results and plots are here: http://people.apache.org/~eevans/3634-1/ I look forward to comparing these with the tests Brandon is running.
          Hide
          Jonathan Ellis added a comment -

          Brandon, can you also test increased column size, and not just count? E.g. CFS uses 2MB columns.

          Show
          Jonathan Ellis added a comment - Brandon, can you also test increased column size, and not just count? E.g. CFS uses 2MB columns.
          Hide
          Brandon Williams added a comment -

          I have completed my benchmarks, with some interesting results. Each test was run against a 3 node cluster at RF=1, with a separate client machine pointed at it. Unless otherwise noted, each test was repeated 5 times, and the results are based on the aggregate of those runs. Due to this, for the sake of time, I didn't do all of the same tests that Eric did.

          The following results trim some of beginning and the end off from each run to avoid any warmup/falloff interference. Each run was compared against the others to check for outliers.

          Insert 40M rows, 5 columns

          Operations

          Type Mean SD 10th 25th 50th 75th 95th 99th
          CQL 45387.2772643 5904.81307513 37315.9 42172.25 46481.0 49434.75 53178.95 56010.6
          RPC 45931.9053199 6132.87181886 37661.6 42772.0 47025 50124.0 53961.5 57014.84
          PS 48311.0570284 8291.6760434 37031.6 43918.25 49592.5 54021.0 59505.25 63267.25
          BB 54603.8995816 11536.6909915 37070.2 48304 57419 62682 69488.6 74195.64

          Latency differences were negligible.

          Read 40M rows, 5 columns

          Operations

          Type Mean SD 10th 25th 50th 75th 95th 99th
          CQL 51443.1086957 3745.68428455 46905.2 51052.0 52492.0 53435.75 54634.3 55384.79
          RPC 57503.4226592 4003.63039895 51631.8 56866.5 58758 59848.5 61284.0 62149.58
          PS 53620.2478142 3863.83989801 48231.9 52604.75 54854.0 56003.0 57381.25 58247.87
          BB 56141.4066438 3837.89800487 50404.3 55446.25 57226.5 58418.25 59947.45 60845.6

          Latency

          Type Mean SD 10th 25th 50th 75th 95th 99th
          CQL 0.00610705090124 0.000469496778011 0.00581482442558 0.0059168154319 0.00603730067576 0.00620725273782 0.00700885743783 0.00738831808365
          RPC 0.00533403099278 0.000470016449963 0.00501528631633 0.00512358578504 0.0052437593985 0.00542117815048 0.00627800086141 0.00664429953201
          PS 0.00527725020587 0.000575478263522 0.00478626820553 0.00496031451654 0.00513961124805 0.00543807678393 0.00645881637227 0.00710907693204
          BB 0.00550721726674 0.000465152724459 0.0051733781942 0.00529720514525 0.00542902622263 0.00561093846787 0.00644494569622 0.00697081447867

          After this point, I began focusing on PS and BB alone.

          Insert 4M rows, 50 columns

          Operations

          Type Mean SD 10th 25th 50th 75th 95th 99th
          PS 16646.3925759 5280.09708065 8170.2 14582 18628 19979 21875.4 22844.2
          BB 18788.0462487 8293.24195773 5300.4 13353 21986 24688 28159.2 29828.76

          Latency

          Type Mean SD 10th 25th 50th 75th 95th 99th
          PS nan nan 0.0139596939996 0.0148752865229 0.0160873770492 0.0190529939254 0.0592844420784 nan
          BB nan nan 0.0110781744836 0.0121558163106 0.0138960263181 0.0211759061834 nan nan

          The NaNs indicate periods where nothing happened, likely caused by CSLM garbage from the higher column counts.

          Read 4M rows, 50 columns

          Operations

          Type Mean SD 10th 25th 50th 75th 95th 99th
          PS 34539.9544554 637.625135771 33708.8 34177 34649 34983 35437.6 35622.2
          BB 34763.4899598 687.137336912 33898.6 34515.25 34887.5 35226.75 35526.15 35677.27

          Latencies were statistically identical.

          Insert 2M rows, 100 columns

          Operations

          Type Mean SD 10th 25th 50th 75th 95th 99th
          PS 9942.96043656 3256.29877687 4520.4 9172 11310 12039 12748.0 13093.84
          BB 11187.3251232 5054.66940811 1868.8 8712.0 13169.0 14822.75 16438.7 17366.29

          Latency

          Type Mean SD 10th 25th 50th 75th 95th 99th
          PS nan nan 0.0255763122529 0.0265708752266 0.0281904388031 0.0330053557766 0.124159655014 nan
          BB nan nan 0.0196192202003 0.0211057864458 0.0237350334693 0.0366660241253 nan nan

          Read 2M rows, 100 columns

          Operations

          Type Mean SD 10th 25th 50th 75th 95th 99th
          PS 19457.1524249 63.7626578468 19375.4 19453 19477 19489 19499.0 19525.76
          BB 19452.0904872 67.9280612887 19391 19441.5 19474 19486.5 19498.0 19518.8

          Latency

          Type Mean SD 10th 25th 50th 75th 95th 99th
          PS 0.0169462953925 0.000645556882921 0.0168879243752 0.0169206096685 0.0169354541263 0.0169647912699 0.0170544990146 0.0171358551454
          BB 0.016875082754 0.00061442017596 0.0169232071407 0.0169351055969 0.0169496839184 0.0169832991663 0.0170569914419 0.0171009883986

          Read 1M rows, 5 columns, 2Kb values

          Note that I'm omitting inserts here, as the machines were obviously bound by the commit log. I also chose this combination carefully, so as not to be network bound. Approximate peak is 922Mb on a gigabit network.

          Operations

          Type Mean SD 10th 25th 50th 75th 95th 99th
          PS 12042.5353846 10.7183298178 12028.0 12037 12045 12050 12055.0 12060.76
          BB 11757.3492537 141.932842405 11651.8 11700.0 11780 11822.5 11939.0 11964.6

          Latency

          Type Mean SD 10th 25th 50th 75th 95th 99th
          PS 0.0273751391287 0.000433847035981 0.0273802118349 0.0273989370536 0.0274172059068 0.0274350859706 0.0274714290518 0.0275139832189
          BB 0.00452289713894 0.000443022928354 0.00435995387188 0.004620072698 0.00464701897019 0.00468012476814 0.00470964983515 0.00472619195963

          Notes

          40x5
          • Outside of this test, measuring inserts gets a bit iffy because GC begins playing a significant role, though this should even out with enough runs
          • Even in this test, GC is still likely a factor at this scale, since reads end up being faster than writes
          • BB's dominance on inserts is undeniable, though both it and PS have a significantly higher standard deviation, and BB is very bursty
          • reads were very consistent across the board here, these are very trustworthy results
          4x50
          • reads are again very consistent, but with little difference between PS and BB
          2x100
          • reads are statistically identical
          1x5x2Kb
          • throughput is roughly the same between them, but BB's standard deviation is 14x higher. Mostly that is a result of PS being extremely consistent.
          • for whatever reason, PS pays a huge latency penalty on large values. This is very consistent across runs.
          Show
          Brandon Williams added a comment - I have completed my benchmarks, with some interesting results. Each test was run against a 3 node cluster at RF=1, with a separate client machine pointed at it. Unless otherwise noted, each test was repeated 5 times, and the results are based on the aggregate of those runs. Due to this, for the sake of time, I didn't do all of the same tests that Eric did. The following results trim some of beginning and the end off from each run to avoid any warmup/falloff interference. Each run was compared against the others to check for outliers. Insert 40M rows, 5 columns Operations Type Mean SD 10th 25th 50th 75th 95th 99th CQL 45387.2772643 5904.81307513 37315.9 42172.25 46481.0 49434.75 53178.95 56010.6 RPC 45931.9053199 6132.87181886 37661.6 42772.0 47025 50124.0 53961.5 57014.84 PS 48311.0570284 8291.6760434 37031.6 43918.25 49592.5 54021.0 59505.25 63267.25 BB 54603.8995816 11536.6909915 37070.2 48304 57419 62682 69488.6 74195.64 Latency differences were negligible. Read 40M rows, 5 columns Operations Type Mean SD 10th 25th 50th 75th 95th 99th CQL 51443.1086957 3745.68428455 46905.2 51052.0 52492.0 53435.75 54634.3 55384.79 RPC 57503.4226592 4003.63039895 51631.8 56866.5 58758 59848.5 61284.0 62149.58 PS 53620.2478142 3863.83989801 48231.9 52604.75 54854.0 56003.0 57381.25 58247.87 BB 56141.4066438 3837.89800487 50404.3 55446.25 57226.5 58418.25 59947.45 60845.6 Latency Type Mean SD 10th 25th 50th 75th 95th 99th CQL 0.00610705090124 0.000469496778011 0.00581482442558 0.0059168154319 0.00603730067576 0.00620725273782 0.00700885743783 0.00738831808365 RPC 0.00533403099278 0.000470016449963 0.00501528631633 0.00512358578504 0.0052437593985 0.00542117815048 0.00627800086141 0.00664429953201 PS 0.00527725020587 0.000575478263522 0.00478626820553 0.00496031451654 0.00513961124805 0.00543807678393 0.00645881637227 0.00710907693204 BB 0.00550721726674 0.000465152724459 0.0051733781942 0.00529720514525 0.00542902622263 0.00561093846787 0.00644494569622 0.00697081447867 After this point, I began focusing on PS and BB alone. Insert 4M rows, 50 columns Operations Type Mean SD 10th 25th 50th 75th 95th 99th PS 16646.3925759 5280.09708065 8170.2 14582 18628 19979 21875.4 22844.2 BB 18788.0462487 8293.24195773 5300.4 13353 21986 24688 28159.2 29828.76 Latency Type Mean SD 10th 25th 50th 75th 95th 99th PS nan nan 0.0139596939996 0.0148752865229 0.0160873770492 0.0190529939254 0.0592844420784 nan BB nan nan 0.0110781744836 0.0121558163106 0.0138960263181 0.0211759061834 nan nan The NaNs indicate periods where nothing happened, likely caused by CSLM garbage from the higher column counts. Read 4M rows, 50 columns Operations Type Mean SD 10th 25th 50th 75th 95th 99th PS 34539.9544554 637.625135771 33708.8 34177 34649 34983 35437.6 35622.2 BB 34763.4899598 687.137336912 33898.6 34515.25 34887.5 35226.75 35526.15 35677.27 Latencies were statistically identical. Insert 2M rows, 100 columns Operations Type Mean SD 10th 25th 50th 75th 95th 99th PS 9942.96043656 3256.29877687 4520.4 9172 11310 12039 12748.0 13093.84 BB 11187.3251232 5054.66940811 1868.8 8712.0 13169.0 14822.75 16438.7 17366.29 Latency Type Mean SD 10th 25th 50th 75th 95th 99th PS nan nan 0.0255763122529 0.0265708752266 0.0281904388031 0.0330053557766 0.124159655014 nan BB nan nan 0.0196192202003 0.0211057864458 0.0237350334693 0.0366660241253 nan nan Read 2M rows, 100 columns Operations Type Mean SD 10th 25th 50th 75th 95th 99th PS 19457.1524249 63.7626578468 19375.4 19453 19477 19489 19499.0 19525.76 BB 19452.0904872 67.9280612887 19391 19441.5 19474 19486.5 19498.0 19518.8 Latency Type Mean SD 10th 25th 50th 75th 95th 99th PS 0.0169462953925 0.000645556882921 0.0168879243752 0.0169206096685 0.0169354541263 0.0169647912699 0.0170544990146 0.0171358551454 BB 0.016875082754 0.00061442017596 0.0169232071407 0.0169351055969 0.0169496839184 0.0169832991663 0.0170569914419 0.0171009883986 Read 1M rows, 5 columns, 2Kb values Note that I'm omitting inserts here, as the machines were obviously bound by the commit log. I also chose this combination carefully, so as not to be network bound. Approximate peak is 922Mb on a gigabit network. Operations Type Mean SD 10th 25th 50th 75th 95th 99th PS 12042.5353846 10.7183298178 12028.0 12037 12045 12050 12055.0 12060.76 BB 11757.3492537 141.932842405 11651.8 11700.0 11780 11822.5 11939.0 11964.6 Latency Type Mean SD 10th 25th 50th 75th 95th 99th PS 0.0273751391287 0.000433847035981 0.0273802118349 0.0273989370536 0.0274172059068 0.0274350859706 0.0274714290518 0.0275139832189 BB 0.00452289713894 0.000443022928354 0.00435995387188 0.004620072698 0.00464701897019 0.00468012476814 0.00470964983515 0.00472619195963 Notes 40x5 Outside of this test, measuring inserts gets a bit iffy because GC begins playing a significant role, though this should even out with enough runs Even in this test, GC is still likely a factor at this scale, since reads end up being faster than writes BB's dominance on inserts is undeniable, though both it and PS have a significantly higher standard deviation, and BB is very bursty reads were very consistent across the board here, these are very trustworthy results 4x50 reads are again very consistent, but with little difference between PS and BB 2x100 reads are statistically identical 1x5x2Kb throughput is roughly the same between them, but BB's standard deviation is 14x higher. Mostly that is a result of PS being extremely consistent. for whatever reason, PS pays a huge latency penalty on large values. This is very consistent across runs.
          Hide
          Jonathan Ellis added a comment -

          So, I'd summarize this as:

          • for throughput, BB is consistently about 10% faster on inserts, and about equal on reads, across all row types
          • BB has substantially lower latency for large values on reads
          • something is fishy w/ BB stdev that might be worth investigating (generating extra garbage somehow)?

          10% faster writes is a big enough deal that I'm in favor of committing the BB version for 1.1.

          Show
          Jonathan Ellis added a comment - So, I'd summarize this as: for throughput, BB is consistently about 10% faster on inserts, and about equal on reads, across all row types BB has substantially lower latency for large values on reads something is fishy w/ BB stdev that might be worth investigating (generating extra garbage somehow)? 10% faster writes is a big enough deal that I'm in favor of committing the BB version for 1.1.
          Hide
          Brandon Williams added a comment -

          BB has substantially lower latency for large values on reads

          I found the problem here, all tests were run with 300 threads, except for this one with BB. Which explains both the higher latency and slightly better throughput.

          Show
          Brandon Williams added a comment - BB has substantially lower latency for large values on reads I found the problem here, all tests were run with 300 threads, except for this one with BB. Which explains both the higher latency and slightly better throughput.
          Hide
          Eric Evans added a comment -

          for throughput, BB is consistently about 10% faster on inserts, and about equal on reads, across all row types

          Since there isn't anything here wildly inconsistent with my results, I'd summarize it as ~10% faster on inserts, and about equal on reads, counter increments, and index inserts.

          BB has substantially lower latency for large values on reads

          I don't see how this test can be correct since the cost of parsing the query is identical no matter how wide the rows are, or how large the values.

          something is fishy w/ BB stdev that might be worth investigating (generating extra garbage somehow)?

          This was consistent with what I saw as well, though for the life of me I can't imagine what's causing it.

          10% faster writes is a big enough deal that I'm in favor of committing the BB version for 1.1.

          It's not nearly so compelling to me. 10% is definitely on the high side of making me stand up and take notice, but it's not enormous.

          It's also limited to inserts, and requires that you completely saturate the processors to make it evident at all, which is not a typical workload. That doesn't make it irrelevant, just more relevant to those conducting benchmarks than to real users.

          On the other side, what's at stake is increased complexity for an arbitrary number of clients, and a proven vector for bugs. And, to make this class of bug even more interesting, it has the potential to make otherwise identical queries return different results depending on whether they use the prepared statement API, or the conventional one.

          THAT BEING SAID: I've heard from enough people that were following the results as they came in to know that most people (engineers?) have a hard time looking past a simple faster/slower distinction, (even when the difference in question was much less than 10%). If others feel the same, that we should give up this abstraction for 10% faster standard writes, then I won't belabor the point further.

          Show
          Eric Evans added a comment - for throughput, BB is consistently about 10% faster on inserts, and about equal on reads, across all row types Since there isn't anything here wildly inconsistent with my results, I'd summarize it as ~10% faster on inserts, and about equal on reads, counter increments, and index inserts. BB has substantially lower latency for large values on reads I don't see how this test can be correct since the cost of parsing the query is identical no matter how wide the rows are, or how large the values. something is fishy w/ BB stdev that might be worth investigating (generating extra garbage somehow)? This was consistent with what I saw as well, though for the life of me I can't imagine what's causing it. 10% faster writes is a big enough deal that I'm in favor of committing the BB version for 1.1. It's not nearly so compelling to me. 10% is definitely on the high side of making me stand up and take notice, but it's not enormous. It's also limited to inserts, and requires that you completely saturate the processors to make it evident at all, which is not a typical workload. That doesn't make it irrelevant, just more relevant to those conducting benchmarks than to real users. On the other side, what's at stake is increased complexity for an arbitrary number of clients, and a proven vector for bugs. And, to make this class of bug even more interesting, it has the potential to make otherwise identical queries return different results depending on whether they use the prepared statement API, or the conventional one. THAT BEING SAID: I've heard from enough people that were following the results as they came in to know that most people (engineers?) have a hard time looking past a simple faster/slower distinction, (even when the difference in question was much less than 10%). If others feel the same, that we should give up this abstraction for 10% faster standard writes, then I won't belabor the point further.
          Hide
          Brandon Williams added a comment -

          It's not nearly so compelling to me. 10% is definitely on the high side of making me stand up and take notice, but it's not enormous.

          It's more compelling to me when compared in the context of the existing RPC performance. 5% is gain okay (PS vs RPC), but 16% (BB vs RPC) is a fairly substantial improvement.

          I was a little worried about the variance (even though the worst cases are pretty close) but I ran some tests with the commit log disable and the deviation is on par with the rest, I think it's just fast enough to push it that hard.

          Show
          Brandon Williams added a comment - It's not nearly so compelling to me. 10% is definitely on the high side of making me stand up and take notice, but it's not enormous. It's more compelling to me when compared in the context of the existing RPC performance. 5% is gain okay (PS vs RPC), but 16% (BB vs RPC) is a fairly substantial improvement. I was a little worried about the variance (even though the worst cases are pretty close) but I ran some tests with the commit log disable and the deviation is on par with the rest, I think it's just fast enough to push it that hard.
          Hide
          Eric Evans added a comment -

          It's more compelling to me when compared in the context of the existing RPC performance. 5% is gain okay (PS vs RPC), but 16% (BB vs RPC) is a fairly substantial improvement.

          I was a little worried about the variance (even though the worst cases are pretty close) but I ran some tests with the commit log disable and the deviation is on par with the rest, I think it's just fast enough to push it that hard.

          That's interesting. I got so wrapped up in the ByteBuffer vs. String comparison that I lost track of the fact that your results put CQL w/ prepared statements ahead of RPC across the board (which is the most important take-away from this I hope!). That would mean that you're willing to trade that node-side abstraction for performance that is already above-and-beyond RPC. I think I completely overestimated how compelling the simplicity/abstraction vs performance trade-off argument would be to folks.

          Show
          Eric Evans added a comment - It's more compelling to me when compared in the context of the existing RPC performance. 5% is gain okay (PS vs RPC), but 16% (BB vs RPC) is a fairly substantial improvement. I was a little worried about the variance (even though the worst cases are pretty close) but I ran some tests with the commit log disable and the deviation is on par with the rest, I think it's just fast enough to push it that hard. That's interesting. I got so wrapped up in the ByteBuffer vs. String comparison that I lost track of the fact that your results put CQL w/ prepared statements ahead of RPC across the board (which is the most important take-away from this I hope!). That would mean that you're willing to trade that node-side abstraction for performance that is already above-and-beyond RPC. I think I completely overestimated how compelling the simplicity/abstraction vs performance trade-off argument would be to folks.
          Hide
          Jonathan Ellis added a comment -

          My reasoning is, there aren't a whole lot of places left to pick up an extra 10% performance... Two years ago, or one, maybe 10% isn't such a big deal since there's so much left to optimize. That's no longer the case; I don't think we should knowingly lock our next-gen interface into a lower-performing design. Once made, we're stuck with this decision, or at least with a really, really high barrier to change it.

          On the other hand, we have the downside of extra complexity for the driver authors. While this is a valid point, it's a finite one – once a prepared statement api has been created and debugged, binary vs strings isn't going to matter. It's a one-time fee in exchange for better performance forever. Additionally, sample binary marshalling code already exists for any language with a Thrift driver. So we're really talking about a relatively small amount of work to build a binary-based PS api, over a String one.

          Show
          Jonathan Ellis added a comment - My reasoning is, there aren't a whole lot of places left to pick up an extra 10% performance... Two years ago, or one, maybe 10% isn't such a big deal since there's so much left to optimize. That's no longer the case; I don't think we should knowingly lock our next-gen interface into a lower-performing design. Once made, we're stuck with this decision, or at least with a really, really high barrier to change it. On the other hand, we have the downside of extra complexity for the driver authors. While this is a valid point, it's a finite one – once a prepared statement api has been created and debugged, binary vs strings isn't going to matter. It's a one-time fee in exchange for better performance forever. Additionally, sample binary marshalling code already exists for any language with a Thrift driver. So we're really talking about a relatively small amount of work to build a binary-based PS api, over a String one.
          Hide
          Eric Evans added a comment -

          My reasoning is, there aren't a whole lot of places left to pick up an extra 10% performance... Two years ago, or one, maybe 10% isn't such a big deal since there's so much left to optimize. That's no longer the case; I don't think we should knowingly lock our next-gen interface into a lower-performing design. Once made, we're stuck with this decision, or at least with a really, really high barrier to change it.

          I think a custom protocol (planned for reasons unrelated to performance) could easily be worth 10%. I take your point though, there isn't a lot of low hanging fruit left.

          On the other hand, we have the downside of extra complexity for the driver authors. While this is a valid point, it's a finite one – once a prepared statement api has been created and debugged, binary vs strings isn't going to matter. It's a one-time fee in exchange for better performance forever. Additionally, sample binary marshalling code already exists for any language with a Thrift driver. So we're really talking about a relatively small amount of work to build a binary-based PS api, over a String one.

          I'm probably a little less optimistic about the amount of work or the potential for bugs. A Pycassa bug that comes to mind caused integers to be mis-encoded for more than a year before it was caught and fixed (and this being one of our most (the most?) battle-tested libraries).

          That said, I do understand all of your points.

          Considering the kind of trade-off we're talking about, I wanted this issue to be thoroughly thought through/discussed, with any relevant data readily at hand. The scale is obviously quite different (I'm not citing a full swing of the pendulum here), but the arguments for/against are basically the same ones that spawned CQL in the first place. And, as you said, changing later is prohibitively difficult; We're going to have to live with this decision.

          I posted to client-dev@ earlier (I don't know why I didn't think of that a week ago). They're basically our front-line users in this regard, and I think it would be interesting to hear from some of them (particularly if I'm carrying a mantle none of them care about ).

          Show
          Eric Evans added a comment - My reasoning is, there aren't a whole lot of places left to pick up an extra 10% performance... Two years ago, or one, maybe 10% isn't such a big deal since there's so much left to optimize. That's no longer the case; I don't think we should knowingly lock our next-gen interface into a lower-performing design. Once made, we're stuck with this decision, or at least with a really, really high barrier to change it. I think a custom protocol (planned for reasons unrelated to performance) could easily be worth 10%. I take your point though, there isn't a lot of low hanging fruit left. On the other hand, we have the downside of extra complexity for the driver authors. While this is a valid point, it's a finite one – once a prepared statement api has been created and debugged, binary vs strings isn't going to matter. It's a one-time fee in exchange for better performance forever. Additionally, sample binary marshalling code already exists for any language with a Thrift driver. So we're really talking about a relatively small amount of work to build a binary-based PS api, over a String one. I'm probably a little less optimistic about the amount of work or the potential for bugs. A Pycassa bug that comes to mind caused integers to be mis-encoded for more than a year before it was caught and fixed (and this being one of our most ( the most?) battle-tested libraries). That said, I do understand all of your points. Considering the kind of trade-off we're talking about, I wanted this issue to be thoroughly thought through/discussed, with any relevant data readily at hand. The scale is obviously quite different (I'm not citing a full swing of the pendulum here), but the arguments for/against are basically the same ones that spawned CQL in the first place. And, as you said, changing later is prohibitively difficult; We're going to have to live with this decision. I posted to client-dev@ earlier (I don't know why I didn't think of that a week ago). They're basically our front-line users in this regard, and I think it would be interesting to hear from some of them (particularly if I'm carrying a mantle none of them care about ).
          Hide
          Sylvain Lebresne added a comment -

          If we are trading some (one time) additional work on the driver's author to support prepared statement against a potential 10%, I'd be in favor of the binary approach (given that prepared statement are an optimization for performance in the first place, and without saying that we shouldn't care about making driver's author life as easy as possible).

          Now I'd mention that I think the binary approach is actually more problematic for custom comparators, because drivers will need to expose a way for the user to either say how to pack their strings, or to directly provide pre-serialized binary. Which then put a little more burden on the user (of course the user knows of to serialize the strings for his custom comparator since he write the comparator, but this means a duplication of code for him, possibly in multiple languages, which is no fun).

          Show
          Sylvain Lebresne added a comment - If we are trading some (one time) additional work on the driver's author to support prepared statement against a potential 10%, I'd be in favor of the binary approach (given that prepared statement are an optimization for performance in the first place, and without saying that we shouldn't care about making driver's author life as easy as possible). Now I'd mention that I think the binary approach is actually more problematic for custom comparators, because drivers will need to expose a way for the user to either say how to pack their strings, or to directly provide pre-serialized binary. Which then put a little more burden on the user (of course the user knows of to serialize the strings for his custom comparator since he write the comparator, but this means a duplication of code for him, possibly in multiple languages, which is no fun).
          Hide
          Jonathan Ellis added a comment -

          this means a duplication of code for him, possibly in multiple languages, which is no fun

          True, but that's the case for a String-based api as well as a binary one.

          Show
          Jonathan Ellis added a comment - this means a duplication of code for him, possibly in multiple languages, which is no fun True, but that's the case for a String-based api as well as a binary one.
          Hide
          Sylvain Lebresne added a comment - - edited

          that's the case for a String-based api as well as a binary one.

          For String-based, the translation from String->binary would happen server side, where we do have the custom comparator and thus the fromString() method (which have to be written from the comparator anyway). With binary, you would have to write the comparator.fromString(), but also have the same client side.

          Of course, that is if the client has its customely-comparable prepared statement parameters in a string form to begin with.

          On the other side, taking binary would likely make it much more easier to take binary data (like pictures), without forcing a binary->string client-side followed by a string->binary decoding server side. Which is probably a bigger deal than the custom comparator thingy now that I think of that.

          So (and unless I don't understand anything on this issue and what's above is completely false), I'm clearly in favor of binary.

          Show
          Sylvain Lebresne added a comment - - edited that's the case for a String-based api as well as a binary one. For String-based, the translation from String->binary would happen server side, where we do have the custom comparator and thus the fromString() method (which have to be written from the comparator anyway). With binary, you would have to write the comparator.fromString(), but also have the same client side. Of course, that is if the client has its customely-comparable prepared statement parameters in a string form to begin with. On the other side, taking binary would likely make it much more easier to take binary data (like pictures), without forcing a binary->string client-side followed by a string->binary decoding server side. Which is probably a bigger deal than the custom comparator thingy now that I think of that. So (and unless I don't understand anything on this issue and what's above is completely false), I'm clearly in favor of binary.
          Hide
          Jonathan Ellis added a comment - - edited

          That's not what I meant. Suppose I have a custom geohash type. In a String-based PS world, I need to provide a geohash.toString method for each client driver. In a binary-based PS world, I need to provide a geohash.toByteBuffer method. So either way you're going to have to write custom code per client driver.

          Update: this comment was obsoleted before I posted it by concurrent editing. It sounds like we are in violent agreement.

          Show
          Jonathan Ellis added a comment - - edited That's not what I meant. Suppose I have a custom geohash type. In a String-based PS world, I need to provide a geohash.toString method for each client driver. In a binary-based PS world, I need to provide a geohash.toByteBuffer method. So either way you're going to have to write custom code per client driver. Update: this comment was obsoleted before I posted it by concurrent editing. It sounds like we are in violent agreement.
          Hide
          Jonathan Ellis added a comment -

          Given that the primary purpose of a "real" PS api is for performance (otherwise we could just fake it client-side the way we used to with JDBC), and the feedback from client devs was mixed, I propose that we proceed with the binary PS api. Client implementers who do not wish to deal with this can continue to use the pure string based, non-PS API, and they are no worse off than before.

          Show
          Jonathan Ellis added a comment - Given that the primary purpose of a "real" PS api is for performance (otherwise we could just fake it client-side the way we used to with JDBC), and the feedback from client devs was mixed, I propose that we proceed with the binary PS api. Client implementers who do not wish to deal with this can continue to use the pure string based, non-PS API, and they are no worse off than before.
          Hide
          Rick Shaw added a comment - - edited

          While I still believe the string approach offers the most flexibility; I am 100%

          +1

          on moving forward and getting something firmly in place. If that is ByteBuffer argument, so be it.

          Show
          Rick Shaw added a comment - - edited While I still believe the string approach offers the most flexibility; I am 100% +1 on moving forward and getting something firmly in place. If that is ByteBuffer argument, so be it.
          Hide
          Rick Shaw added a comment -

          If we settle on ByteBuffer as the encoding of arguments then I believe we must accompany it with a indicator of how it was encoded. Without it, on the server side we will have to trust that the caller encoded the argument to exactly match the proper encoding for the validator or comparator used in the Term.

          With String encoding as the argument, there is no such requirement because every validator or comparator can use fromString() method.

          Complicating this in the JDBC driver is that the notion that the caller is free to provide almost any compatible type to the "setter" method and the "right" data will be passed to the server side. The client side does not have explicit knowledge of the required data types that have been dictated by the CQL that was prepared on the server side. Really no problem with string because all args are passed as a string to be dealt with on the other side; but with ByteBuffer if you let the user pass a number as a string for something that has a validator of type: varint it will throw an Exception.

          If we pass the encoding method in addition to ByteBuffer we will have enough into on the server side to transpose the data to the required format that is dictated by the validator or comparator.

          Show
          Rick Shaw added a comment - If we settle on ByteBuffer as the encoding of arguments then I believe we must accompany it with a indicator of how it was encoded. Without it, on the server side we will have to trust that the caller encoded the argument to exactly match the proper encoding for the validator or comparator used in the Term . With String encoding as the argument, there is no such requirement because every validator or comparator can use fromString() method. Complicating this in the JDBC driver is that the notion that the caller is free to provide almost any compatible type to the "setter" method and the "right" data will be passed to the server side. The client side does not have explicit knowledge of the required data types that have been dictated by the CQL that was prepared on the server side. Really no problem with string because all args are passed as a string to be dealt with on the other side; but with ByteBuffer if you let the user pass a number as a string for something that has a validator of type: varint it will throw an Exception . If we pass the encoding method in addition to ByteBuffer we will have enough into on the server side to transpose the data to the required format that is dictated by the validator or comparator.
          Hide
          Jonathan Ellis added a comment -

          If we settle on ByteBuffer as the encoding of arguments then I believe we must accompany it with a indicator of how it was encoded. Without it, on the server side we will have to trust that the caller encoded the argument to exactly match the proper encoding for the validator or comparator used in the Term.

          I don't see why we need to add extra complexity here. Trusting the client to encode correctly is what we've done on the Thrift side for years.

          Show
          Jonathan Ellis added a comment - If we settle on ByteBuffer as the encoding of arguments then I believe we must accompany it with a indicator of how it was encoded. Without it, on the server side we will have to trust that the caller encoded the argument to exactly match the proper encoding for the validator or comparator used in the Term. I don't see why we need to add extra complexity here. Trusting the client to encode correctly is what we've done on the Thrift side for years.
          Hide
          Rick Shaw added a comment -

          Perhaps it is a JDBC specific problem but often, tooling will use setInt() if it is a number to be stored in a string column and more frequently a setString() for an integer field. The concern is the PreparedStatement suite of methods provides a lot of flexibility to do these kinds of implied transformations that will be difficult without cooperation between the client and server. to do on the client side only, will require knowing the entire potential schema cached on the client side.

          Show
          Rick Shaw added a comment - Perhaps it is a JDBC specific problem but often, tooling will use setInt() if it is a number to be stored in a string column and more frequently a setString() for an integer field. The concern is the PreparedStatement suite of methods provides a lot of flexibility to do these kinds of implied transformations that will be difficult without cooperation between the client and server. to do on the client side only, will require knowing the entire potential schema cached on the client side.
          Hide
          Jonathan Ellis added a comment -

          I see, so you're saying you want the server to be able to respond with an error if the app dev wrote setString("0") for an int data type.

          That's reasonable, but that's separate from whether we're using BB or String for the bind variables, since you still have ambiguity in the String representation (as in the example above).

          Show
          Jonathan Ellis added a comment - I see, so you're saying you want the server to be able to respond with an error if the app dev wrote setString("0") for an int data type. That's reasonable, but that's separate from whether we're using BB or String for the bind variables, since you still have ambiguity in the String representation (as in the example above).
          Hide
          Rick Shaw added a comment -

          Well I assume the err will be thrown if the server can discern that an error exists but without additional info there is no way to know it is not compatible outside of the verify() method in AbstractType. The problem is (or mooter? ) if String is the mechanism of transfer.

          I don't see any ambiguity in String? Each method pushes a string (toString()? or custom encoding) representation; so as long as fromString() on the validator can handle without exception then most variations will work. It will stumble on things like passing "xyz" to a numeric data type, but that is reported gracefully as a conversion exception to numeric.

          My point has always been that IF we choose passing ByteBuffer then the server side would only benefit from knowing how the data was encoded in terms of checking for errors and enhanced ability to transform data to the required type. I'll do what it takes on the client side either way.

          Show
          Rick Shaw added a comment - Well I assume the err will be thrown if the server can discern that an error exists but without additional info there is no way to know it is not compatible outside of the verify() method in AbstractType . The problem is (or mooter? ) if String is the mechanism of transfer. I don't see any ambiguity in String? Each method pushes a string ( toString()? or custom encoding) representation; so as long as fromString() on the validator can handle without exception then most variations will work. It will stumble on things like passing "xyz" to a numeric data type, but that is reported gracefully as a conversion exception to numeric. My point has always been that IF we choose passing ByteBuffer then the server side would only benefit from knowing how the data was encoded in terms of checking for errors and enhanced ability to transform data to the required type. I'll do what it takes on the client side either way.
          Hide
          Eric Evans added a comment -

          Perhaps it is a JDBC specific problem but often, tooling will use setInt() if it is a number to be stored in a string column and more frequently a setString() for an integer field. The concern is the PreparedStatement suite of methods provides a lot of flexibility to do these kinds of implied transformations that will be difficult without cooperation between the client and server. to do on the client side only, will require knowing the entire potential schema cached on the client side.

          I think I see what you're saying and this just falls under the "strings are easier argument". You're able to get away with more because it's the server that's doing the marshaling for you. For example, it doesn't matter whether the schema is Int32, Integer, Long, etc, as long as you pass something that vaguely looks like a number, it'll do the Right Thing.

          With binary arguments you will need to keep a client-side copy of the schema so that you know how to encode each argument (like Thrift clients have been doing for some time).

          So if a user calls setString("10") where schema is LongType, you'll need to first create a long from the string, and then marshal it to bytes for the request.

          Validation is also something that you're going to need to do client-side; I don't think there is any validation that the server can do that it isn't already. For example, with numeric types, other than validating the length of the byte[] (think Long or Int32), there really aren't any bytes that would be invalid.

          Show
          Eric Evans added a comment - Perhaps it is a JDBC specific problem but often, tooling will use setInt() if it is a number to be stored in a string column and more frequently a setString() for an integer field. The concern is the PreparedStatement suite of methods provides a lot of flexibility to do these kinds of implied transformations that will be difficult without cooperation between the client and server. to do on the client side only, will require knowing the entire potential schema cached on the client side. I think I see what you're saying and this just falls under the "strings are easier argument". You're able to get away with more because it's the server that's doing the marshaling for you. For example, it doesn't matter whether the schema is Int32, Integer, Long, etc, as long as you pass something that vaguely looks like a number, it'll do the Right Thing. With binary arguments you will need to keep a client-side copy of the schema so that you know how to encode each argument (like Thrift clients have been doing for some time). So if a user calls setString("10") where schema is LongType, you'll need to first create a long from the string, and then marshal it to bytes for the request. Validation is also something that you're going to need to do client-side; I don't think there is any validation that the server can do that it isn't already. For example, with numeric types, other than validating the length of the byte[] (think Long or Int32), there really aren't any bytes that would be invalid .
          Hide
          Rick Shaw added a comment -

          Yes. That is all exactly what I meant to say...

          It can all be done on the client side if you have the full current schema available which, of course, is doable but expensive (in time) to get in place. ResultSet needs similar info so there may be some sharing that can take place but I haven't really looked yet.

          Validation, transformation, or corrective action could be done on the server side if you knew how it was encoded in the first place; hence my suggestion.

          Show
          Rick Shaw added a comment - Yes. That is all exactly what I meant to say... It can all be done on the client side if you have the full current schema available which, of course, is doable but expensive (in time) to get in place. ResultSet needs similar info so there may be some sharing that can take place but I haven't really looked yet. Validation, transformation, or corrective action could be done on the server side if you knew how it was encoded in the first place; hence my suggestion.
          Hide
          Eric Evans added a comment -

          Validation, transformation, or corrective action could be done on the server side if you knew how it was encoded in the first place; hence my suggestion.

          That's pretty much what happens with string args though. If you're going to be re-encoding byte arrays then it sort of defeats the purpose of pre-serializing to bytes in the first place, no?

          Show
          Eric Evans added a comment - Validation, transformation, or corrective action could be done on the server side if you knew how it was encoded in the first place; hence my suggestion. That's pretty much what happens with string args though. If you're going to be re-encoding byte arrays then it sort of defeats the purpose of pre-serializing to bytes in the first place, no?
          Hide
          Eric Evans added a comment -

          Given that the primary purpose of a "real" PS api is for performance (otherwise we could just fake it client-side the way we used to with JDBC), and the feedback from client devs was mixed, I propose that we proceed with the binary PS api. Client implementers who do not wish to deal with this can continue to use the pure string based, non-PS API, and they are no worse off than before.

          We don't exactly have consensus, but it's pretty obvious at this point that we probably never will. And as Rick points out, we need to nail down something.

          Does anyone want to take another look at the changes before committing?

          https://github.com/eevans/cassandra/tree/3634.rebased

          Show
          Eric Evans added a comment - Given that the primary purpose of a "real" PS api is for performance (otherwise we could just fake it client-side the way we used to with JDBC), and the feedback from client devs was mixed, I propose that we proceed with the binary PS api. Client implementers who do not wish to deal with this can continue to use the pure string based, non-PS API, and they are no worse off than before. We don't exactly have consensus, but it's pretty obvious at this point that we probably never will. And as Rick points out, we need to nail down something. Does anyone want to take another look at the changes before committing? https://github.com/eevans/cassandra/tree/3634.rebased
          Hide
          Rick Shaw added a comment -

          +1

          Changes look good to me.

          Show
          Rick Shaw added a comment - +1 Changes look good to me.
          Hide
          Sylvain Lebresne added a comment -

          It can all be done on the client side if you have the full current schema available which, of course, is doable but expensive (in time) to get in place.

          I think we could send enough info with the CqlPreparedResult, i.e, replace the count by a list of types, like what we do for CqlResult. It would be simpler for drivers than keeping the full schema somewhere and probably parse the initial prepared query to figure out to what each marker correspond in the schema.

          There would be the slight issue of someone changing the validation of a given value between preparation and execution, but I don't think it's a big deal at all to say that you'll have to re-prepare queries if you do that (how often do you actually change a value validation function anyway, and even if you do so, you'd better change it for something that is compatible with the previous type for CQL, so in fact most changes would not be a problem).

          Show
          Sylvain Lebresne added a comment - It can all be done on the client side if you have the full current schema available which, of course, is doable but expensive (in time) to get in place. I think we could send enough info with the CqlPreparedResult, i.e, replace the count by a list of types, like what we do for CqlResult. It would be simpler for drivers than keeping the full schema somewhere and probably parse the initial prepared query to figure out to what each marker correspond in the schema. There would be the slight issue of someone changing the validation of a given value between preparation and execution, but I don't think it's a big deal at all to say that you'll have to re-prepare queries if you do that (how often do you actually change a value validation function anyway, and even if you do so, you'd better change it for something that is compatible with the previous type for CQL, so in fact most changes would not be a problem).
          Hide
          Eric Evans added a comment -

          committed (7c92fc52)

          Show
          Eric Evans added a comment - committed (7c92fc52)
          Hide
          Jonathan Ellis added a comment -

          I think we could send enough info with the CqlPreparedResult, i.e, replace the count by a list of types

          Created CASSANDRA-3753 to follow up on that.

          Show
          Jonathan Ellis added a comment - I think we could send enough info with the CqlPreparedResult, i.e, replace the count by a list of types Created CASSANDRA-3753 to follow up on that.

            People

            • Assignee:
              Eric Evans
              Reporter:
              Eric Evans
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development