Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 2.0 beta 1
    • Component/s: Core
    • Labels:
      None

      Description

      The row (partition) cache easily does more harm than good. People expect it to act like a query cache but it is very different than that, especially for the wide partitions that are so common in Cassandra data models.

      Making it off-heap by default only helped a little; we still have to deserialize the partition to the heap to query it.

      Ultimately we can add a better cache based on the ideas in CASSANDRA-1956 or CASSANDRA-2864, but even if we don't get to that until 2.1, removing the old row cache for 2.0 is a good idea.

      1. 5348.txt
        25 kB
        Jonathan Ellis

        Issue Links

          Activity

          Hide
          Sylvain Lebresne added a comment -

          For the record, I'm not convinced this is a good idea.

          I wholeheartedly agree that we need something better and that the row cache is not to be used in every workload.

          But in the meantime, there is workloads in which the row cache works well. My experience with a relatively standard use of Cassandra was that about 1/3 of the column families were simple static column families. For those, the row cache is useful and does more good than harm. That's not nothing.

          And more importantly, I don't understand the rush in removing it. Chances are, we'll have a better solution by 2.1 (and I'm all for making that a priority of 2.1). What should these people that use the row cache correctly do? They'll have to setup a memcache or some other external solution just for 2.0? Seems like we penalizing people that use a tool correctly because some other doesn't (and I'm not blaming new people for expecting the row cache to be a query cache, but I do have a problem penalizing experienced people).

          What if instead we continue to educate people just one more version on when to not use the current row cache (let's blog about it, let's send a mail on the user list, ...)?

          Anyway, just wanted to voice my opinion.

          Show
          Sylvain Lebresne added a comment - For the record, I'm not convinced this is a good idea. I wholeheartedly agree that we need something better and that the row cache is not to be used in every workload. But in the meantime, there is workloads in which the row cache works well. My experience with a relatively standard use of Cassandra was that about 1/3 of the column families were simple static column families. For those, the row cache is useful and does more good than harm. That's not nothing. And more importantly, I don't understand the rush in removing it. Chances are, we'll have a better solution by 2.1 (and I'm all for making that a priority of 2.1). What should these people that use the row cache correctly do? They'll have to setup a memcache or some other external solution just for 2.0? Seems like we penalizing people that use a tool correctly because some other doesn't (and I'm not blaming new people for expecting the row cache to be a query cache, but I do have a problem penalizing experienced people). What if instead we continue to educate people just one more version on when to not use the current row cache (let's blog about it, let's send a mail on the user list, ...)? Anyway, just wanted to voice my opinion.
          Hide
          Jason Brown added a comment -

          I agree with Sylvain on this. We've spent a lot of time and effort pitching row cacheing, and it might be a bit of a kick in the shins to pull it out on those use who've decided to give it a shot (and are using it successfully). There are a few (few, mind you) reasonable uses of row cache here at NFLX, and those users are happy with it.

          I'll agree about the education piece, but I think it would be good to have our rewrite story reasonably organized before asking the community to avoid row caches. That way, we can have a smoother transition for users.

          Unless the effort of working around the current implementation is a complete road block to getting the new hotness up and running, I think we should keep the existing solution around. Making the rewrite a priority for 2.1 gets a +1 from me.

          Show
          Jason Brown added a comment - I agree with Sylvain on this. We've spent a lot of time and effort pitching row cacheing, and it might be a bit of a kick in the shins to pull it out on those use who've decided to give it a shot (and are using it successfully). There are a few (few, mind you) reasonable uses of row cache here at NFLX, and those users are happy with it. I'll agree about the education piece, but I think it would be good to have our rewrite story reasonably organized before asking the community to avoid row caches. That way, we can have a smoother transition for users. Unless the effort of working around the current implementation is a complete road block to getting the new hotness up and running, I think we should keep the existing solution around. Making the rewrite a priority for 2.1 gets a +1 from me.
          Hide
          Jonathan Ellis added a comment -

          How about removing on-heap cache as a compromise?

          Show
          Jonathan Ellis added a comment - How about removing on-heap cache as a compromise?
          Hide
          Jason Brown added a comment -

          Still looking for some blood, huh ? I think that is a reasonable compromise, as we resolve some technical debt yet retain the better part of that implementation (the off heap part).

          So, +1 to Jonathan Ellis's suggestion.

          Show
          Jason Brown added a comment - Still looking for some blood, huh ? I think that is a reasonable compromise, as we resolve some technical debt yet retain the better part of that implementation (the off heap part). So, +1 to Jonathan Ellis 's suggestion.
          Hide
          Jonathan Ellis added a comment -

          Patch to remove on-heap row cache.

          Show
          Jonathan Ellis added a comment - Patch to remove on-heap row cache.
          Hide
          Jason Brown added a comment -

          code lgtm, but tests couldn't compile

              [javac] /usr/local/src/cassandra/test/unit/org/apache/cassandra/db/CollationControllerTest.java:73: error: constructor CollationController in class CollationController cannot be applied to given types;
              [javac]         CollationController controller = new CollationController(store, false, filter, Integer.MIN_VALUE);
              [javac]                                          ^
              [javac]   required: ColumnFamilyStore,QueryFilter,int
              [javac]   found: ColumnFamilyStore,boolean,QueryFilter,int
              [javac]   reason: actual and formal argument lists differ in length
              [javac] /usr/local/src/cassandra/test/unit/org/apache/cassandra/db/CollationControllerTest.java:81: error: constructor CollationController in class CollationController cannot be applied to given types;
              [javac]         controller = new CollationController(store, false, filter, Integer.MIN_VALUE);
              [javac]                      ^
              [javac]   required: ColumnFamilyStore,QueryFilter,int
              [javac]   found: ColumnFamilyStore,boolean,QueryFilter,int
              [javac]   reason: actual and formal argument lists differ in length
          

          Once I removed the boolean 'false' argument to the method, it compiled. Running tests now.

          Show
          Jason Brown added a comment - code lgtm, but tests couldn't compile [javac] /usr/local/src/cassandra/test/unit/org/apache/cassandra/db/CollationControllerTest.java:73: error: constructor CollationController in class CollationController cannot be applied to given types; [javac] CollationController controller = new CollationController(store, false , filter, Integer .MIN_VALUE); [javac] ^ [javac] required: ColumnFamilyStore,QueryFilter, int [javac] found: ColumnFamilyStore, boolean ,QueryFilter, int [javac] reason: actual and formal argument lists differ in length [javac] /usr/local/src/cassandra/test/unit/org/apache/cassandra/db/CollationControllerTest.java:81: error: constructor CollationController in class CollationController cannot be applied to given types; [javac] controller = new CollationController(store, false , filter, Integer .MIN_VALUE); [javac] ^ [javac] required: ColumnFamilyStore,QueryFilter, int [javac] found: ColumnFamilyStore, boolean ,QueryFilter, int [javac] reason: actual and formal argument lists differ in length Once I removed the boolean 'false' argument to the method, it compiled. Running tests now.
          Hide
          Jonathan Ellis added a comment -

          fixed + committed

          Show
          Jonathan Ellis added a comment - fixed + committed
          Hide
          Jason Brown added a comment -

          tests completed and passed.

          Show
          Jason Brown added a comment - tests completed and passed.
          Hide
          Robert Coli added a comment -

          I understand and agree with the idea of removing the Row Cache as likely to be hazardous to most users.

          I do not, however, understand removing the on-heap Row Cache and keeping the off-heap one.

          Problems with on-heap cache :

          1) if you make it too big, you consume too much heap

          Problems with off-heap cache :

          1) still consumes heap despite being off-heap, including marginal heap on each read/write
          2) serialize-deserialize penalty on read/write
          3) invalidates on write

          Other than the fact that the on-heap is more likely to cause you problems by running out of heap if it is too large, it seems on its face to be a better implementation of the row cache concept than the off-heap row cache. If we already accept that the Row Cache is for use by people who know what they are doing... aren't those users likely to actually prefer the on-heap cache, especially in 2.0 where heap pressure is the least severe it has ever been? Is there something I'm missing about what makes the on-heap cache so bad?

          tl;dr : I +1 sylvain's comments above, but with some questions re on-heap vs. off-heap.

          Show
          Robert Coli added a comment - I understand and agree with the idea of removing the Row Cache as likely to be hazardous to most users. I do not, however, understand removing the on-heap Row Cache and keeping the off-heap one. Problems with on-heap cache : 1) if you make it too big, you consume too much heap Problems with off-heap cache : 1) still consumes heap despite being off-heap, including marginal heap on each read/write 2) serialize-deserialize penalty on read/write 3) invalidates on write Other than the fact that the on-heap is more likely to cause you problems by running out of heap if it is too large, it seems on its face to be a better implementation of the row cache concept than the off-heap row cache. If we already accept that the Row Cache is for use by people who know what they are doing... aren't those users likely to actually prefer the on-heap cache, especially in 2.0 where heap pressure is the least severe it has ever been? Is there something I'm missing about what makes the on-heap cache so bad? tl;dr : I +1 sylvain's comments above, but with some questions re on-heap vs. off-heap.
          Hide
          Michael Andrews added a comment - - edited

          +1 to Robert and Sylvain's comments. We were planning to use the on-heap row cache as a memcache replacement for a very small but heavily used table, but now our upgrade to 2.0 has put us back at the drawing board. I thought Cassandra was built for system engineers who knew the risks of what they were doing and had the flexibility to make those choices? I can wait until 2.1 for a better caching mechanism, but this seems like something that could have stayed with a disclaimer and more documentation on how to use it appropriately.

          Show
          Michael Andrews added a comment - - edited +1 to Robert and Sylvain's comments. We were planning to use the on-heap row cache as a memcache replacement for a very small but heavily used table, but now our upgrade to 2.0 has put us back at the drawing board. I thought Cassandra was built for system engineers who knew the risks of what they were doing and had the flexibility to make those choices? I can wait until 2.1 for a better caching mechanism, but this seems like something that could have stayed with a disclaimer and more documentation on how to use it appropriately.

            People

            • Assignee:
              Jonathan Ellis
              Reporter:
              Jonathan Ellis
              Reviewer:
              Jason Brown
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development