Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9259 Bulk Reading from Cassandra
  3. CASSANDRA-11542

Create a benchmark to compare HDFS and Cassandra bulk read times

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Fix Version/s: None
    • Component/s: Legacy/Testing
    • Labels:
      None

      Description

      I propose creating a benchmark for comparing Cassandra and HDFS bulk reading performance. Simple Spark queries will be performed on data stored in HDFS or Cassandra, and the entire duration will be measured. An example query would be the max or min of a column or a count(*).

      This benchmark should allow determining the impact of:

      • partition size
      • number of clustering columns
      • number of value columns (cells)

        Attachments

        1. jfr_recordings.zip
          927 kB
          Stefania Alborghetti
        2. spark-load-perf-results-001.zip
          71 kB
          Stefania Alborghetti
        3. spark-load-perf-results-002.zip
          38 kB
          Stefania Alborghetti
        4. spark-load-perf-results-003.zip
          72 kB
          Stefania Alborghetti

          Issue Links

            Activity

              People

              • Assignee:
                stefania Stefania Alborghetti
                Reporter:
                stefania Stefania Alborghetti
                Authors:
                Stefania Alborghetti
                Reviewers:
                Russell Spitzer
              • Votes:
                0 Vote for this issue
                Watchers:
                20 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: