Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11053

COPY FROM on large datasets: fix progress report and optimize performance part 4

    XMLWordPrintableJSON

Details

    • Normal

    Description

      Description

      Running COPY from on a large dataset (20G divided in 20M records) revealed two issues:

      • The progress report is incorrect, it is very slow until almost the end of the test at which point it catches up extremely quickly.
      • The performance in rows per second is similar to running smaller tests with a smaller cluster locally (approx 35,000 rows per second). As a comparison, cassandra-stress manages 50,000 rows per second under the same set-up, therefore resulting 1.5 times faster.

      See attached file copy_from_large_benchmark.txt for the benchmark details.

      Doc-impacting changes to COPY FROM options
      • A new option was added: PREPAREDSTATEMENTS - it indicates if prepared statements should be used; it defaults to true.
      • The default value of CHUNKSIZE changed from 1000 to 5000.
      • The default value of MINBATCHSIZE changed from 2 to 10.

      Attachments

        1. bisect_test.py
          1 kB
          Stefania Alborghetti
        2. copy_from_large_benchmark_2.txt
          5 kB
          Stefania Alborghetti
        3. parent_profile_2.txt
          9 kB
          Stefania Alborghetti
        4. worker_profiles_2.txt
          61 kB
          Stefania Alborghetti
        5. copy_from_large_benchmark.txt
          3 kB
          Stefania Alborghetti
        6. parent_profile.txt
          9 kB
          Stefania Alborghetti
        7. worker_profiles.txt
          193 kB
          Stefania Alborghetti

        Issue Links

          Activity

            People

              stefania Stefania Alborghetti
              stefania Stefania Alborghetti
              Stefania Alborghetti
              Adam Holmberg
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: