Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.9
    • Component/s: Core
    • Labels:
      None

      Description

      What we're not sure about is the effect on compaction efficiency –
      larger files mean that each level contains more data, so reads will
      have to touch less sstables, but we're also compacting less unchanged
      data when we merge forward.

      So the question is, how big can we make the sstables to get the benefits of the
      first effect, before the second effect starts to dominate?

      1. BytesRead_vs_LCS.png
        34 kB
        Daniel Meyer
      2. ReadLatency_vs_LCS.png
        56 kB
        Daniel Meyer
      3. Throughtput_vs_LCS.png
        38 kB
        Daniel Meyer
      4. UpdateLatency_vs_LCS.png
        44 kB
        Daniel Meyer

        Activity

        Hide
        Robert Coli added a comment -

        Anecdotally, many people on @cassandra-user/#cassandra have been bitten by the current 5mb size. The types of negative experiences they have seem to relate to too many SSTables for "small" or "medium" data sizes. Even a relatively naive doubling of this default to 10mb seems likely to be a win for most of these users.

        Show
        Robert Coli added a comment - Anecdotally, many people on @cassandra-user/#cassandra have been bitten by the current 5mb size. The types of negative experiences they have seem to relate to too many SSTables for "small" or "medium" data sizes. Even a relatively naive doubling of this default to 10mb seems likely to be a win for most of these users.
        Hide
        Radim Kolar added a comment -

        Leveldb is using 2MB by default. Real problem is naive leveldb implementation in cassandra.

        Show
        Radim Kolar added a comment - Leveldb is using 2MB by default. Real problem is naive leveldb implementation in cassandra.
        Hide
        Jonathan Ellis added a comment - - edited

        Don't be a troll. If you have constructive feedback, make it here or in another ticket. Multiple Cassandra developers have read the leveldb source; there is no magic there.

        For the record, leveldb is designed for low concurrency embedded purposes. Everyone who tries to use it for a multiuser database (riak, hyperdex, probably others) has to do some serious surgery of the kind we've done. (Don't block writes for L0, concurrent compactions, etc.)

        Show
        Jonathan Ellis added a comment - - edited Don't be a troll. If you have constructive feedback, make it here or in another ticket. Multiple Cassandra developers have read the leveldb source; there is no magic there. For the record, leveldb is designed for low concurrency embedded purposes. Everyone who tries to use it for a multiuser database (riak, hyperdex, probably others) has to do some serious surgery of the kind we've done. (Don't block writes for L0, concurrent compactions, etc.)
        Hide
        Radim Kolar added a comment -

        You are missing 3 important points. There is no need to read lvldb source code, just read log file and you will notice differences in compaction, which contribute to speedup.

        I implemented lvldb 3 times. First into HDFS - where metadata are stored into path, second version was improvement of cassandra code, and 3rd version is improved version of current google record based backend, which is leveldb based with modifications allowing it run without filesystem and network part removed + 2 small performance tunings with major effect.

        You are right about concurrent compactions and non blocking writes for L0.

        K, you wanted hint, so here it is: Compare L0 in C 2.0 and leveldb.

        Show
        Radim Kolar added a comment - You are missing 3 important points. There is no need to read lvldb source code, just read log file and you will notice differences in compaction, which contribute to speedup. I implemented lvldb 3 times. First into HDFS - where metadata are stored into path, second version was improvement of cassandra code, and 3rd version is improved version of current google record based backend, which is leveldb based with modifications allowing it run without filesystem and network part removed + 2 small performance tunings with major effect. You are right about concurrent compactions and non blocking writes for L0. K, you wanted hint, so here it is: Compare L0 in C 2.0 and leveldb.
        Hide
        Daniel Meyer added a comment -

        I have conducted an investigation into the default LCS file size. YCSB was used to perform all tests. The system under test consisted of a single rackspace node with 2GB of ram. YCSB workloada was used for all tests, which consists of 50/50 read/update workload with the total number of operations set to 900K. The amount of data was varied from 4GB to 40GB.
        LCS file size was varied for the 4GB tests as 5MB, 10MB, 20MB, 160MB, 320MB, 475MB, 640MB, 1280MB. LCS file size for the 40GB tests was varied as 5MB, 40MB, 80MB, 160MB, 320MB, 640MB.

        It is important to note that the 40GB test was not runnable with the current default LCS file size of 5MB due to consistent OOM errors. Those OOM issues go away with increased LCS file size.

        Based upon the data from this experiment an LCS file size of 160MB would be an optimal default value. Please see attached graphs.

        Show
        Daniel Meyer added a comment - I have conducted an investigation into the default LCS file size. YCSB was used to perform all tests. The system under test consisted of a single rackspace node with 2GB of ram. YCSB workloada was used for all tests, which consists of 50/50 read/update workload with the total number of operations set to 900K. The amount of data was varied from 4GB to 40GB. LCS file size was varied for the 4GB tests as 5MB, 10MB, 20MB, 160MB, 320MB, 475MB, 640MB, 1280MB. LCS file size for the 40GB tests was varied as 5MB, 40MB, 80MB, 160MB, 320MB, 640MB. It is important to note that the 40GB test was not runnable with the current default LCS file size of 5MB due to consistent OOM errors. Those OOM issues go away with increased LCS file size. Based upon the data from this experiment an LCS file size of 160MB would be an optimal default value. Please see attached graphs.
        Hide
        Radim Kolar added a comment -

        did you tried to measure standard compaction strategy to see if 160 MB LCS brings improvements?

        Show
        Radim Kolar added a comment - did you tried to measure standard compaction strategy to see if 160 MB LCS brings improvements?
        Hide
        Jonathan Ellis added a comment -

        Based upon the data from this experiment an LCS file size of 160MB would be an optimal default value.

        Thanks, Daniel. Done in d2f43e41cad76fccc666b0d90ef9a20df221e22e.

        did you tried to measure standard compaction strategy to see if 160 MB LCS brings improvements?

        Did you look at the graphs? I don't see how a control of STCS would tell us anything new.

        Show
        Jonathan Ellis added a comment - Based upon the data from this experiment an LCS file size of 160MB would be an optimal default value. Thanks, Daniel. Done in d2f43e41cad76fccc666b0d90ef9a20df221e22e. did you tried to measure standard compaction strategy to see if 160 MB LCS brings improvements? Did you look at the graphs? I don't see how a control of STCS would tell us anything new.
        Hide
        T Jake Luciani added a comment -

        Daniel Meyer Did you track compaction time across sizes?

        Show
        T Jake Luciani added a comment - Daniel Meyer Did you track compaction time across sizes?
        Hide
        Jonathan Ellis added a comment -

        Yes. Time was proportional to the bytes compacted [bytesread graph].

        Show
        Jonathan Ellis added a comment - Yes. Time was proportional to the bytes compacted [bytesread graph] .

          People

          • Assignee:
            Daniel Meyer
            Reporter:
            Jonathan Ellis
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development