Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9259 Bulk Reading from Cassandra
  3. CASSANDRA-11542

Create a benchmark to compare HDFS and Cassandra bulk read times

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • None
    • Legacy/Testing
    • None

    Description

      I propose creating a benchmark for comparing Cassandra and HDFS bulk reading performance. Simple Spark queries will be performed on data stored in HDFS or Cassandra, and the entire duration will be measured. An example query would be the max or min of a column or a count(*).

      This benchmark should allow determining the impact of:

      • partition size
      • number of clustering columns
      • number of value columns (cells)

      Attachments

        1. spark-load-perf-results-003.zip
          72 kB
          Stefania Alborghetti
        2. jfr_recordings.zip
          927 kB
          Stefania Alborghetti
        3. spark-load-perf-results-002.zip
          38 kB
          Stefania Alborghetti
        4. spark-load-perf-results-001.zip
          71 kB
          Stefania Alborghetti

        Issue Links

          Activity

            People

              stefania Stefania Alborghetti
              stefania Stefania Alborghetti
              Stefania Alborghetti
              Russell Spitzer
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: