Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9259 Bulk Reading from Cassandra
  3. CASSANDRA-11542

Create a benchmark to compare HDFS and Cassandra bulk read times

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • None
    • Legacy/Testing
    • None

    Description

      I propose creating a benchmark for comparing Cassandra and HDFS bulk reading performance. Simple Spark queries will be performed on data stored in HDFS or Cassandra, and the entire duration will be measured. An example query would be the max or min of a column or a count(*).

      This benchmark should allow determining the impact of:

      • partition size
      • number of clustering columns
      • number of value columns (cells)

      Attachments

        1. spark-load-perf-results-003.zip
          72 kB
          Stefania Alborghetti
        2. spark-load-perf-results-002.zip
          38 kB
          Stefania Alborghetti
        3. spark-load-perf-results-001.zip
          71 kB
          Stefania Alborghetti
        4. jfr_recordings.zip
          927 kB
          Stefania Alborghetti

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            stefania Stefania Alborghetti Assign to me
            stefania Stefania Alborghetti
            Stefania Alborghetti
            Russell Spitzer
            Votes:
            0 Vote for this issue
            Watchers:
            19 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment