Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11053

COPY FROM on large datasets: fix progress report and optimize performance part 4

    XMLWordPrintableJSON

    Details

    • Severity:
      Normal

      Description

      Description

      Running COPY from on a large dataset (20G divided in 20M records) revealed two issues:

      • The progress report is incorrect, it is very slow until almost the end of the test at which point it catches up extremely quickly.
      • The performance in rows per second is similar to running smaller tests with a smaller cluster locally (approx 35,000 rows per second). As a comparison, cassandra-stress manages 50,000 rows per second under the same set-up, therefore resulting 1.5 times faster.

      See attached file copy_from_large_benchmark.txt for the benchmark details.

      Doc-impacting changes to COPY FROM options
      • A new option was added: PREPAREDSTATEMENTS - it indicates if prepared statements should be used; it defaults to true.
      • The default value of CHUNKSIZE changed from 1000 to 5000.
      • The default value of MINBATCHSIZE changed from 2 to 10.

        Attachments

        1. worker_profiles.txt
          193 kB
          Stefania Alborghetti
        2. parent_profile.txt
          9 kB
          Stefania Alborghetti
        3. copy_from_large_benchmark.txt
          3 kB
          Stefania Alborghetti
        4. worker_profiles_2.txt
          61 kB
          Stefania Alborghetti
        5. parent_profile_2.txt
          9 kB
          Stefania Alborghetti
        6. copy_from_large_benchmark_2.txt
          5 kB
          Stefania Alborghetti
        7. bisect_test.py
          1 kB
          Stefania Alborghetti

          Issue Links

            Activity

              People

              • Assignee:
                stefania Stefania Alborghetti
                Reporter:
                stefania Stefania Alborghetti
                Authors:
                Stefania Alborghetti
                Reviewers:
                Adam Holmberg
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: