Cassandra
  1. Cassandra
  2. CASSANDRA-3545

Fix very low Secondary Index performance

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.1.0
    • Component/s: Core
    • Labels:
      None

      Description

      While performing index search + value filtering over large Index Row ( ~100k keys per index value) with chunks (size of 512-1024 keys) search time is about 8-12 seconds, which is very very low.

      After profiling I got this picture:

      60% of search time is calculating MD5 hash with MessageDigester (Of cause it is because of RundomPartitioner).
      33% of search time (half of all MD5 hash calculating time) is double calculating of MD5 for comparing two row keys while rotating Index row to startKey (when performing search query for next chunk).

      I see several performance improvements:

      1) Use good algorithm to search startKey in sorted collection, that is faster then iteration over all keys. This solution is on first place because it simple, need only local code changes and should solve problem (increase search in multiple times).

      2) Don't calculate MD5 hash for startKey every time. It's optimal to compute it once (so search will be twice faster).
      Also need local code changes.

      3) Think about something faster that MD5 for hashing (like TigerRandomPartitioner with Tiger/128 hash).
      Need research and maybe this research was done.

      4) Don't use Tokens (with MD5 hash for RandomPartitioner) for comparing and sorting keys in index rows. In index rows, keys can be stored and compared with simple Byte Comparator.
      This solution requires huge code changes.

      I'm going to start from first solution. Next improvements can be done with next tickets.

      1. 0001-3545.patch
        10 kB
        Sylvain Lebresne
      2. 0002-cleanup.patch
        14 kB
        Sylvain Lebresne

        Activity

        Hide
        Evgeny Ryabitskiy added a comment -

        Screen from profiler

        Show
        Evgeny Ryabitskiy added a comment - Screen from profiler
        Hide
        Evgeny Ryabitskiy added a comment -

        Patch for 1) solution, that is using fast algorithm for search element in sorted collection with much less compare usage.

        Actually this solution can improve any ColumnSlice resolving.

        Show
        Evgeny Ryabitskiy added a comment - Patch for 1) solution, that is using fast algorithm for search element in sorted collection with much less compare usage. Actually this solution can improve any ColumnSlice resolving.
        Hide
        Jonathan Ellis added a comment -

        That sounds like a good place to start. Thanks for the analysis!

        Show
        Jonathan Ellis added a comment - That sounds like a good place to start. Thanks for the analysis!
        Hide
        Evgeny Ryabitskiy added a comment -

        New version of patch with fix of bugs and clear code.

        Show
        Evgeny Ryabitskiy added a comment - New version of patch with fix of bugs and clear code.
        Hide
        Evgeny Ryabitskiy added a comment -

        Patch is ready, please review.

        I have check search over index with 70k keys.
        Result: search is 3 times faster, profiling show no more bottleneck on MD5 hash calculation.

        Show
        Evgeny Ryabitskiy added a comment - Patch is ready, please review. I have check search over index with 70k keys. Result: search is 3 times faster, profiling show no more bottleneck on MD5 hash calculation.
        Hide
        Jonathan Ellis added a comment -

        Wow. That's a really clever workaround to working on a generic Collection iterator, that we happen to know is sorted. But it's also kind of complicated, and does a bunch of copying to a temporary collection that I'd prefer to avoid.

        What if we added a getSortedColumns(byte[] startWith) overload to AbstractColumnContainer + ISortedColumns? Then each ISortedColumns implementation could implement that the straightforward way (for ArrayBacked, with Collections.binarySearch + subList; for the NavigableMap based ones, with tailMap).

        Show
        Jonathan Ellis added a comment - Wow. That's a really clever workaround to working on a generic Collection iterator, that we happen to know is sorted. But it's also kind of complicated, and does a bunch of copying to a temporary collection that I'd prefer to avoid. What if we added a getSortedColumns(byte[] startWith) overload to AbstractColumnContainer + ISortedColumns? Then each ISortedColumns implementation could implement that the straightforward way (for ArrayBacked, with Collections.binarySearch + subList; for the NavigableMap based ones, with tailMap).
        Hide
        Evgeny Ryabitskiy added a comment -

        Yep, a little bit complicated. That is why it is fully covered by JUnits and comments.
        Cobertura reports:
        Classes in this File Line Coverage Branch Coverage Complexity
        CollectionSearchUtil 91% 51/56 81% 36/44 6,5

        There is NO bunch of copying to a temporary collection. There only one array of size sqrt(2*N), allocated once and coping is linear (same as iterating).

        Let's analyze resources required for this search on N size collection:

        Memory usage: == sqrt(2*N)
        Array copping: <= N
        Iteration ( next() execution): <= N
        Compare (with MD5/Column comparator): <= sqrt(2*N)

        I have solution (see patch 1) that perform iterating instead of allocation array, but it will require O(N^2) iterating in worst case.

        In second patch memory usage is trade of for only one passage with iterator. Iteration can be slow, so array is much better.

        You can check that for million columns search (pretty big row) it will be array with length: ~1440. Not too much for such huge search I think .
        In case of 10k columns row it will be only 144 length array, with is pretty few.

        About getSortedColumns(byte[] startWith):
        Yes, it is another good solution. binarySearch is little bit faster in case you have indexed access to underling Columns (like List or Array).

        But there is still one disadvantage: My patch is solving this problem and changes only few code lines
        and this solution requires much more code changes. Lot's of code changes - low release stability. Sorry for sharing pain in JIRA tickets but 1.0.3 seems to be last stable release

        As a compromise I can suggest to apply this patch and add ticket for feature to cleanup code, move to binary search and new API in ISortedColumns.

        Show
        Evgeny Ryabitskiy added a comment - Yep, a little bit complicated. That is why it is fully covered by JUnits and comments. Cobertura reports: Classes in this File Line Coverage Branch Coverage Complexity CollectionSearchUtil 91% 51/56 81% 36/44 6,5 There is NO bunch of copying to a temporary collection. There only one array of size sqrt(2*N), allocated once and coping is linear (same as iterating). Let's analyze resources required for this search on N size collection: Memory usage: == sqrt(2*N) Array copping: <= N Iteration ( next() execution): <= N Compare (with MD5/Column comparator): <= sqrt(2*N) I have solution (see patch 1) that perform iterating instead of allocation array, but it will require O(N^2) iterating in worst case. In second patch memory usage is trade of for only one passage with iterator. Iteration can be slow, so array is much better. You can check that for million columns search (pretty big row) it will be array with length: ~1440. Not too much for such huge search I think . In case of 10k columns row it will be only 144 length array, with is pretty few. About getSortedColumns(byte[] startWith): Yes, it is another good solution. binarySearch is little bit faster in case you have indexed access to underling Columns (like List or Array). But there is still one disadvantage: My patch is solving this problem and changes only few code lines and this solution requires much more code changes. Lot's of code changes - low release stability. Sorry for sharing pain in JIRA tickets but 1.0.3 seems to be last stable release As a compromise I can suggest to apply this patch and add ticket for feature to cleanup code, move to binary search and new API in ISortedColumns.
        Hide
        Jonathan Ellis added a comment -

        As a compromise I can suggest to apply this patch and add ticket for feature to cleanup code, move to binary search and new API in ISortedColumns.

        I'm afraid that's typically not how we work. If we see a clearly better approach during review, which this qualifies as, then we'll wait for that before committing. Especially when the goal is to get into a stable branch.

        Show
        Jonathan Ellis added a comment - As a compromise I can suggest to apply this patch and add ticket for feature to cleanup code, move to binary search and new API in ISortedColumns. I'm afraid that's typically not how we work. If we see a clearly better approach during review, which this qualifies as, then we'll wait for that before committing. Especially when the goal is to get into a stable branch.
        Hide
        Sylvain Lebresne added a comment -

        I agree with Jonathan than interning this inside the column family feels cleaner (and is more efficient). Attaching patch to do that (actually 2 patch, the second one does some cleaning of the comparator being given to lots of methods that don't care about it or can get it by other means). The patches are against trunk since I don't think we should push that into a stable release (independently of the actual implementation).

        Note that this only applies to memtable, so this has probably much more impact on small benchmarks (where you insert and get immediately) than it will have in real life (it's still an improvement, don't get me wrong).

        For the rest:

        2) Don't calculate MD5 hash for startKey every time. It's optimal to compute it once (so search will be twice faster).

        Unfortunately I don't see much way to do this any cleanly, without breaking badly the comparator abstraction.

        3) Think about something faster that MD5 for hashing (like TigerRandomPartitioner with Tiger/128 hash).

        It could be worth checking, though a quick search doesn't seem to return much interesting things. Finding a faster MD5 implementation would be convenient too, but the only thing I've found so far is http://twmacinta.com/myjava/fast_md5.php, which is unfortunately incompatible with our licence.

        4) Don't use Tokens (with MD5 hash for RandomPartitioner) for comparing and sorting keys in index rows. In index rows, keys can be stored and compared with simple Byte Comparator

        Imo, that's the most promising option. I don't think that would be very complicated to do (I actually think it would be pretty easy but I may be forgetting a difficulty), but the annoying part will likely be how to deal with the upgrade/backward compatibility. I may give it a shot at some point though.

        Show
        Sylvain Lebresne added a comment - I agree with Jonathan than interning this inside the column family feels cleaner (and is more efficient). Attaching patch to do that (actually 2 patch, the second one does some cleaning of the comparator being given to lots of methods that don't care about it or can get it by other means). The patches are against trunk since I don't think we should push that into a stable release (independently of the actual implementation). Note that this only applies to memtable, so this has probably much more impact on small benchmarks (where you insert and get immediately) than it will have in real life (it's still an improvement, don't get me wrong). For the rest: 2) Don't calculate MD5 hash for startKey every time. It's optimal to compute it once (so search will be twice faster). Unfortunately I don't see much way to do this any cleanly, without breaking badly the comparator abstraction. 3) Think about something faster that MD5 for hashing (like TigerRandomPartitioner with Tiger/128 hash). It could be worth checking, though a quick search doesn't seem to return much interesting things. Finding a faster MD5 implementation would be convenient too, but the only thing I've found so far is http://twmacinta.com/myjava/fast_md5.php , which is unfortunately incompatible with our licence. 4) Don't use Tokens (with MD5 hash for RandomPartitioner) for comparing and sorting keys in index rows. In index rows, keys can be stored and compared with simple Byte Comparator Imo, that's the most promising option. I don't think that would be very complicated to do (I actually think it would be pretty easy but I may be forgetting a difficulty), but the annoying part will likely be how to deal with the upgrade/backward compatibility. I may give it a shot at some point though.
        Hide
        Jonathan Ellis added a comment -

        Think about something faster that MD5 for hashing

        I've suggested a MRP (MurmurRandomPartitioner) in the past... Murmur is substantially faster than MD5, especially v3 (CASSANDRA-2975), and with CASSANDRA-1034 done we don't need to rely on tokens being unique. Murmur gives quite good hash distribution, which is the main thing we care about for partitioning.

        Show
        Jonathan Ellis added a comment - Think about something faster that MD5 for hashing I've suggested a MRP (MurmurRandomPartitioner) in the past... Murmur is substantially faster than MD5, especially v3 ( CASSANDRA-2975 ), and with CASSANDRA-1034 done we don't need to rely on tokens being unique. Murmur gives quite good hash distribution, which is the main thing we care about for partitioning.
        Hide
        Jonathan Ellis added a comment -
        .   public Iterator<IColumn> iterator(ByteBuffer start)
            {
                final ListIterator<IColumn> iter = listIterator(size());
                int idx = binarySearch(start);
                if (idx < 0)
                    idx = -idx-1;
                else if (reversed)
                    idx++;
                return reversed ? reverseInternalIterator(idx) : listIterator(idx);
            }
        

        This doesn't look quite right to me, shouldn't an exact match (idx >= 0) be left alone for the reversed case too?

        (Nit: whitespace around the - 1.)

        Show
        Jonathan Ellis added a comment - . public Iterator<IColumn> iterator(ByteBuffer start) { final ListIterator<IColumn> iter = listIterator(size()); int idx = binarySearch(start); if (idx < 0) idx = -idx-1; else if (reversed) idx++; return reversed ? reverseInternalIterator(idx) : listIterator(idx); } This doesn't look quite right to me, shouldn't an exact match (idx >= 0) be left alone for the reversed case too? (Nit: whitespace around the - 1.)
        Hide
        Sylvain Lebresne added a comment -

        That has to do with the ListIterator used. When you do a reverse iteration (using previous), the first returned is the previous of the index taken by the iterator constructor. The +1 correct that. There is a unit test to check this is correct

        I can add a comment (and fix the whitespace) during commit if there is no other remark/problem.

        Show
        Sylvain Lebresne added a comment - That has to do with the ListIterator used. When you do a reverse iteration (using previous), the first returned is the previous of the index taken by the iterator constructor. The +1 correct that. There is a unit test to check this is correct I can add a comment (and fix the whitespace) during commit if there is no other remark/problem.
        Hide
        Sylvain Lebresne added a comment -

        I've suggested a MRP (MurmurRandomPartitioner) in the past...

        That's a good idea. I've create CASSANDRA-3594 for that and CASSANDRA-3595 to explore the idea of use byte comparison for secondary indexes. So we can close this once once the 'better search' is committed.

        Show
        Sylvain Lebresne added a comment - I've suggested a MRP (MurmurRandomPartitioner) in the past... That's a good idea. I've create CASSANDRA-3594 for that and CASSANDRA-3595 to explore the idea of use byte comparison for secondary indexes. So we can close this once once the 'better search' is committed.
        Hide
        Jonathan Ellis added a comment -

        I can add a comment (and fix the whitespace) during commit if there is no other remark/problem.

        SGTM.

        (For the record, I think that's awful behavior on the part of ListIterator.

        Show
        Jonathan Ellis added a comment - I can add a comment (and fix the whitespace) during commit if there is no other remark/problem. SGTM. (For the record, I think that's awful behavior on the part of ListIterator.
        Hide
        Sylvain Lebresne added a comment -

        Committed. Closing this as said above.

        Show
        Sylvain Lebresne added a comment - Committed. Closing this as said above.
        Hide
        Hudson added a comment -

        Integrated in Cassandra #1248 (See https://builds.apache.org/job/Cassandra/1248/)
        Improve memtable slice iteration performance
        patch by slebresne; reviewed by jbellis for CASSANDRA-3545

        slebresne : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211999
        Files :

        • /cassandra/trunk/CHANGES.txt
        • /cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java
        • /cassandra/trunk/src/java/org/apache/cassandra/db/ArrayBackedSortedColumns.java
        • /cassandra/trunk/src/java/org/apache/cassandra/db/CollationController.java
        • /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
        • /cassandra/trunk/src/java/org/apache/cassandra/db/ISortedColumns.java
        • /cassandra/trunk/src/java/org/apache/cassandra/db/Memtable.java
        • /cassandra/trunk/src/java/org/apache/cassandra/db/RowIteratorFactory.java
        • /cassandra/trunk/src/java/org/apache/cassandra/db/ThreadSafeSortedColumns.java
        • /cassandra/trunk/src/java/org/apache/cassandra/db/TreeMapBackedSortedColumns.java
        • /cassandra/trunk/src/java/org/apache/cassandra/db/filter/IFilter.java
        • /cassandra/trunk/src/java/org/apache/cassandra/db/filter/NamesQueryFilter.java
        • /cassandra/trunk/src/java/org/apache/cassandra/db/filter/QueryFilter.java
        • /cassandra/trunk/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
        • /cassandra/trunk/src/java/org/apache/cassandra/db/index/keys/KeysSearcher.java
        • /cassandra/trunk/src/java/org/apache/cassandra/service/RowRepairResolver.java
        • /cassandra/trunk/test/unit/org/apache/cassandra/db/ArrayBackedSortedColumnsTest.java
        Show
        Hudson added a comment - Integrated in Cassandra #1248 (See https://builds.apache.org/job/Cassandra/1248/ ) Improve memtable slice iteration performance patch by slebresne; reviewed by jbellis for CASSANDRA-3545 slebresne : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211999 Files : /cassandra/trunk/CHANGES.txt /cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java /cassandra/trunk/src/java/org/apache/cassandra/db/ArrayBackedSortedColumns.java /cassandra/trunk/src/java/org/apache/cassandra/db/CollationController.java /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java /cassandra/trunk/src/java/org/apache/cassandra/db/ISortedColumns.java /cassandra/trunk/src/java/org/apache/cassandra/db/Memtable.java /cassandra/trunk/src/java/org/apache/cassandra/db/RowIteratorFactory.java /cassandra/trunk/src/java/org/apache/cassandra/db/ThreadSafeSortedColumns.java /cassandra/trunk/src/java/org/apache/cassandra/db/TreeMapBackedSortedColumns.java /cassandra/trunk/src/java/org/apache/cassandra/db/filter/IFilter.java /cassandra/trunk/src/java/org/apache/cassandra/db/filter/NamesQueryFilter.java /cassandra/trunk/src/java/org/apache/cassandra/db/filter/QueryFilter.java /cassandra/trunk/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java /cassandra/trunk/src/java/org/apache/cassandra/db/index/keys/KeysSearcher.java /cassandra/trunk/src/java/org/apache/cassandra/service/RowRepairResolver.java /cassandra/trunk/test/unit/org/apache/cassandra/db/ArrayBackedSortedColumnsTest.java

          People

          • Assignee:
            Sylvain Lebresne
            Reporter:
            Evgeny Ryabitskiy
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development