Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-3943

Too many small size sstables after loading data using sstableloader or BulkOutputFormat increases compaction time.

    XMLWordPrintableJSON

    Details

      Description

      When we create sstables using SimpleUnsortedWriter or BulkOutputFormat,the size of sstables created is around the buffer size provided.
      But After loading , sstables created in the cluster nodes are of size around

      ( (sstable_size_before_loading) * replication_factor ) / No_Of_Nodes_In_Cluster

      As the no of nodes in cluster goes increasing, size of each sstable loaded to cassandra node decreases.Such small size sstables take too much time to compact (minor compaction) as compare to relatively large size sstables.
      One solution that we have tried is to increase the buffer size while generating sstables.But as we increase the buffer size ,time taken to generate sstables increases.Is there any solution to this in existing versions or are you fixing this in future version?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              samarthg1986 Samarth Gahire
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 168h
                168h
                Remaining:
                Remaining Estimate - 168h
                168h
                Logged:
                Time Spent - Not Specified
                Not Specified