Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
Normal
Description
Description
Running COPY from on a large dataset (20G divided in 20M records) revealed two issues:
- The progress report is incorrect, it is very slow until almost the end of the test at which point it catches up extremely quickly.
- The performance in rows per second is similar to running smaller tests with a smaller cluster locally (approx 35,000 rows per second). As a comparison, cassandra-stress manages 50,000 rows per second under the same set-up, therefore resulting 1.5 times faster.
See attached file copy_from_large_benchmark.txt for the benchmark details.
Doc-impacting changes to COPY FROM options
- A new option was added: PREPAREDSTATEMENTS - it indicates if prepared statements should be used; it defaults to true.
- The default value of CHUNKSIZE changed from 1000 to 5000.
- The default value of MINBATCHSIZE changed from 2 to 10.
Attachments
Attachments
Issue Links
- blocks
-
CASSANDRA-11274 cqlsh: interpret CQL type for formatting blob types
- Resolved
- breaks
-
CASSANDRA-11549 cqlsh: COPY FROM ignores NULL values in conversion
- Resolved
-
CASSANDRA-11574 clqsh: COPY FROM throws TypeError with Cython extensions enabled
- Resolved
- is related to
-
CASSANDRA-9302 Optimize cqlsh COPY FROM, part 3
- Resolved
- relates to
-
CASSANDRA-11630 Make cython optional in pylib/setup.py
- Resolved
-
CASSANDRA-11255 COPY TO should have higher double precision
- Resolved
-
CASSANDRA-11274 cqlsh: interpret CQL type for formatting blob types
- Resolved