Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-8717

Top-k queries with custom secondary indexes

    Details

      Description

      As presented in Cassandra Summit Europe 2014, secondary indexes can be modified to support general top-k queries with minimum changes in Cassandra codebase. This way, custom 2i implementations could provide relevance search, sorting by columns, etc.

      Top-k queries retrieve the k best results for a certain query. That implies querying the k best rows in each token range and then sort them in order to obtain the k globally best rows.

      For doing that, we propose two additional methods in class SecondaryIndexSearcher:

      public boolean requiresFullScan(List<IndexExpression> clause)
      {
          return false;
      }
      
      public List<Row> sort(List<IndexExpression> clause, List<Row> rows)
      {
          return rows;
      }
      

      The first one indicates if a query performed in the index requires querying all the nodes in the ring. It is necessary in top-k queries because we do not know which node are the best results. The second method specifies how to sort all the partial node results according to the query.

      Then we add two similar methods to the class AbstractRangeCommand:

          this.searcher = Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
      
      public boolean requiresFullScan() {
          return searcher == null ? false : searcher.requiresFullScan(rowFilter);
      }
      
      public List<Row> combine(List<Row> rows)
      {
          return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, rows));
      }
      

      Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as shown in the attached patch.

      We think that the proposed approach provides very useful functionality with minimum impact in current codebase.

      1. 0001-Add-support-for-top-k-queries-in-2i.patch
        7 kB
        Andrés de la Peña
      2. 0002-Add-support-for-top-k-queries-in-2i.patch
        11 kB
        Andrés de la Peña
      3. 0003-Add-support-for-top-k-queries-in-2i.patch
        11 kB
        Andrés de la Peña
      4. 0004-Add-support-for-top-k-queries-in-2i.patch
        18 kB
        Andrés de la Peña
      5. 8717-follow-up-2.1.txt
        1.0 kB
        Sam Tunnicliffe
      6. 8717-v5.txt
        17 kB
        Sam Tunnicliffe

        Activity

        Hide
        rstrickland Robbie Strickland added a comment -

        Prior to this patch being submitted, I went through this same exercise and patched 2.1 mainline with these changes. I couldn't see where it broke anything, and it allows users to drop in Stratio's (or their own) custom index implementation. This is a big win!

        Show
        rstrickland Robbie Strickland added a comment - Prior to this patch being submitted, I went through this same exercise and patched 2.1 mainline with these changes. I couldn't see where it broke anything, and it allows users to drop in Stratio's (or their own) custom index implementation. This is a big win!
        Hide
        iamaleksey Aleksey Yeschenko added a comment -

        It's unlikely to get into the 2.1 line, sorry.

        Maybe 3.0, it at all, and should be done on top of CASSANDRA-8099, if at all.

        Show
        iamaleksey Aleksey Yeschenko added a comment - It's unlikely to get into the 2.1 line, sorry. Maybe 3.0, it at all, and should be done on top of CASSANDRA-8099 , if at all.
        Hide
        rstrickland Robbie Strickland added a comment -

        Aleksey Yeschenko Have you looked at the patch? There's barely anything to it, and yet it opens up the door for guys like Stratio to plug in more advanced index implementations without breaking anything (i.e. no need for their fork, which is a good thing). Plus who knows when 3.0 will go mainstream? I think you should reconsider, or at least get some other input.

        Show
        rstrickland Robbie Strickland added a comment - Aleksey Yeschenko Have you looked at the patch? There's barely anything to it, and yet it opens up the door for guys like Stratio to plug in more advanced index implementations without breaking anything (i.e. no need for their fork, which is a good thing). Plus who knows when 3.0 will go mainstream? I think you should reconsider, or at least get some other input.
        Hide
        iamaleksey Aleksey Yeschenko added a comment -

        See also - CASSANDRA-7017. @jhaliday Might have something to add, too.

        It is a small patch, but it does touch the internals, subtly changing behavior that may or may nor be taken into account by the rest of C* codebase. My autopilot reaction is to say 'no' to any potentially breaking changes when it comes to minor C* releases.

        The instabilities we had with the 2.1 line so far (hopefully in the past) make me be even more careful and more aggressive about pushing stuff to 'Later'.

        Show
        iamaleksey Aleksey Yeschenko added a comment - See also - CASSANDRA-7017 . @jhaliday Might have something to add, too. It is a small patch, but it does touch the internals, subtly changing behavior that may or may nor be taken into account by the rest of C* codebase. My autopilot reaction is to say 'no' to any potentially breaking changes when it comes to minor C* releases. The instabilities we had with the 2.1 line so far (hopefully in the past) make me be even more careful and more aggressive about pushing stuff to 'Later'.
        Hide
        iamaleksey Aleksey Yeschenko added a comment -

        That's Jonathan Halliday, sorry.

        Show
        iamaleksey Aleksey Yeschenko added a comment - That's Jonathan Halliday , sorry.
        Hide
        iamaleksey Aleksey Yeschenko added a comment -

        Robbie Strickland Andrés de la Peña I'll have another look at the patch, soon-ish.

        CC Sylvain Lebresne

        Show
        iamaleksey Aleksey Yeschenko added a comment - Robbie Strickland Andrés de la Peña I'll have another look at the patch, soon-ish. CC Sylvain Lebresne
        Hide
        ernestof2 Ernesto Funes added a comment -

        Any news about this issue?

        Show
        ernestof2 Ernesto Funes added a comment - Any news about this issue?
        Hide
        iamaleksey Aleksey Yeschenko added a comment -

        On second thought, this looks reasonable enough for at least 3.0 inclusion - especially if this eventually allows you guys to get rid of that C* fork.

        Still, I want to hear from Sylvain Lebresne and Sam Tunnicliffe, the latter planning to do some C* API refactoring for a while, before proceeding.

        Show
        iamaleksey Aleksey Yeschenko added a comment - On second thought, this looks reasonable enough for at least 3.0 inclusion - especially if this eventually allows you guys to get rid of that C* fork. Still, I want to hear from Sylvain Lebresne and Sam Tunnicliffe , the latter planning to do some C* API refactoring for a while, before proceeding.
        Hide
        rstrickland Robbie Strickland added a comment -

        FWIW, I spoke with several other teams at Spark Summit last week that would
        really like this patch for the same reason.

        Show
        rstrickland Robbie Strickland added a comment - FWIW, I spoke with several other teams at Spark Summit last week that would really like this patch for the same reason.
        Hide
        adelapena Andrés de la Peña added a comment -

        On our side, we would very much like to abandon the fork and distribute our index as a plugin once you guys agree that the proposed changes regarding top-K queries are a go.

        Show
        adelapena Andrés de la Peña added a comment - On our side, we would very much like to abandon the fork and distribute our index as a plugin once you guys agree that the proposed changes regarding top-K queries are a go.
        Hide
        slebresne Sylvain Lebresne added a comment -

        I don't have a problem with this in theory, at least in 3.0 (I tend to agree with Aleksey on that part), though I could argue that what you fundamentally ask is not specific to indexing. What you want is a way to "transform" the result of internal queries. It's rather close to aggregation except that instead of transforming multiple rows into a single, you want to transform some rows into other rows (sorting them being just one particular use case of that). The fact that the results you want to transform is the result of your custom index is kind of incidental. So I do feel that implementing this as the more general concept of results transformation would be cleaner (and more generic). However, doing so is probably a little bit more involved so I'm happy to "hijack" the 2ndary index API for that in the short term and leave generalization to later, provided we agree that we may generalize that better and thus slightly break those new APIs.

        Now on the patch, I do think requiresFullScan somewhat break the concurrencyFactor computation in getRangeSlice as remainingRows can become negative. This is not a huge deal in the sense that the code ensure the concurrentFactor is never smaller than 1, but it still is kind of wrong in principle. In fact, that method is really about modifying the query limit internally (up until the combine method has been applied), and that's imo the proper way to expose it.

        Another nit is that we should rename the sort method in something more generic (as said above, sorting is somewhat of a special case and no reason to imply a limitation to that). It could be renamed combine or, imo a bit better, something like postReconciliationProcessing.

        Show
        slebresne Sylvain Lebresne added a comment - I don't have a problem with this in theory, at least in 3.0 (I tend to agree with Aleksey on that part), though I could argue that what you fundamentally ask is not specific to indexing. What you want is a way to "transform" the result of internal queries. It's rather close to aggregation except that instead of transforming multiple rows into a single, you want to transform some rows into other rows (sorting them being just one particular use case of that). The fact that the results you want to transform is the result of your custom index is kind of incidental. So I do feel that implementing this as the more general concept of results transformation would be cleaner (and more generic). However, doing so is probably a little bit more involved so I'm happy to "hijack" the 2ndary index API for that in the short term and leave generalization to later, provided we agree that we may generalize that better and thus slightly break those new APIs. Now on the patch, I do think requiresFullScan somewhat break the concurrencyFactor computation in getRangeSlice as remainingRows can become negative. This is not a huge deal in the sense that the code ensure the concurrentFactor is never smaller than 1, but it still is kind of wrong in principle. In fact, that method is really about modifying the query limit internally (up until the combine method has been applied), and that's imo the proper way to expose it. Another nit is that we should rename the sort method in something more generic (as said above, sorting is somewhat of a special case and no reason to imply a limitation to that). It could be renamed combine or, imo a bit better, something like postReconciliationProcessing .
        Hide
        adelapena Andrés de la Peña added a comment -

        I agree with your idea about doing this for the short term and leave generalization for later. We can deal with future API changes without problems. What we would need at 2i level is some way to specify that we need to scan all the nodes and the aforementioned method so as to combine the partial results. Indeed, "sort" is not the most fortunate name for this method...

        Show
        adelapena Andrés de la Peña added a comment - I agree with your idea about doing this for the short term and leave generalization for later. We can deal with future API changes without problems. What we would need at 2i level is some way to specify that we need to scan all the nodes and the aforementioned method so as to combine the partial results. Indeed, "sort" is not the most fortunate name for this method...
        Hide
        evanv Evan Volgas added a comment -

        The ability to drop in a custom indexer would be a game changer. I agree with Sylvain Lebresne's point about "sort" being the wrong name for this method. I also really like the idea of hijacking the 2i in the short run and revisiting it later to consider a more generalized version of transforming the result set.

        As far as 2.1 vs 3.0... this is such a big win that I'd hate to see it pushed too far back into later. Is it maybe still possible to try and get this into the 2.1 line?

        Show
        evanv Evan Volgas added a comment - The ability to drop in a custom indexer would be a game changer. I agree with Sylvain Lebresne 's point about "sort" being the wrong name for this method. I also really like the idea of hijacking the 2i in the short run and revisiting it later to consider a more generalized version of transforming the result set. As far as 2.1 vs 3.0... this is such a big win that I'd hate to see it pushed too far back into later. Is it maybe still possible to try and get this into the 2.1 line?
        Hide
        iamaleksey Aleksey Yeschenko added a comment -

        Once the issues raised by Sylvain are resolved, we can put in into 2.1 as an interim measure.

        Show
        iamaleksey Aleksey Yeschenko added a comment - Once the issues raised by Sylvain are resolved, we can put in into 2.1 as an interim measure.
        Hide
        adelapena Andrés de la Peña added a comment -

        It would be nice to have these changes in 2.1. I'm uploading a new version of the patch.

        Sylvain Lebresne, I have renamed, as you suggested, the sort method to postReconciliationProcessing. Also I have renamed the method requiresFullScan to requiresScanningAllRanges, which seems clearer.

        You are totally right about the computation of the concurrency factor. I have created a rowsToBeFetched variable representing the number of rows to be fetched. This is command.limit() in the regular case and command.limit() * ranges.size() when the command requieres scanning all the token ranges. In addition, if we know that the command needs to do a full scan then we can set the concurrency factor to ranges.size() in order to query all the ranges in parallel. Thus, recalculating the concurrency factor is avoided in this particular case of full ranges scan.

        Please let me know what you think about the new patch.

        Show
        adelapena Andrés de la Peña added a comment - It would be nice to have these changes in 2.1. I'm uploading a new version of the patch. Sylvain Lebresne , I have renamed, as you suggested, the sort method to postReconciliationProcessing . Also I have renamed the method requiresFullScan to requiresScanningAllRanges , which seems clearer. You are totally right about the computation of the concurrency factor. I have created a rowsToBeFetched variable representing the number of rows to be fetched. This is command.limit() in the regular case and command.limit() * ranges.size() when the command requieres scanning all the token ranges. In addition, if we know that the command needs to do a full scan then we can set the concurrency factor to ranges.size() in order to query all the ranges in parallel. Thus, recalculating the concurrency factor is avoided in this particular case of full ranges scan. Please let me know what you think about the new patch.
        Hide
        iamaleksey Aleksey Yeschenko added a comment -

        I'll review shortly (we have a conference this week, so expect early next week most likely).

        In the meantime, can you format the patch to match the project's code style - https://wiki.apache.org/cassandra/CodeStyle ?

        Thanks

        Show
        iamaleksey Aleksey Yeschenko added a comment - I'll review shortly (we have a conference this week, so expect early next week most likely). In the meantime, can you format the patch to match the project's code style - https://wiki.apache.org/cassandra/CodeStyle ? Thanks
        Hide
        adelapena Andrés de la Peña added a comment -

        I'm sorry, I forgot the brackets placement. I have uploaded a new version complying with the code style. I have also done some minor changes in the proposed new SecondaryIndexManager's method (for getting the index searcher) in order to make it more general. Thanks.

        Show
        adelapena Andrés de la Peña added a comment - I'm sorry, I forgot the brackets placement. I have uploaded a new version complying with the code style. I have also done some minor changes in the proposed new SecondaryIndexManager 's method (for getting the index searcher) in order to make it more general. Thanks.
        Hide
        beobal Sam Tunnicliffe added a comment -

        Generally, I think this is ok for 2.1, modulo a couple of points:

        A thing that concerns me is that quite a number of AbstractRangeCommand instances are created during execution of an index query, particularly on the coordinator. The primary RangeSliceCommand is created in SelectStatement#getPageableCommand, then a PagedRangeCommand is instantiated for each page of results in RangeSliceQueryPager#queryNextPage. Then, as each of those is pushed down to StorageProxy the fan out creates another instance per sub range. The fact that each of these instances performs the selectivity calculation is a bit of a pain and somewhat wasteful, though it shouldn't be overly expensive & the alternative would involve more change than I'd be comfortable with putting into 2.1.

        What I do think should be addressed before including this in 2.1 is that quite a few more tracing events are now emitted as with tracing enabled, each time SIS#highestSelectivityPredicate is called it we log an event. Perhaps you could add a boolean argument to highestSelectivityPredicate to indicate whether or not the tracing event should be emitted. I think that only the call from the SIS#search implementations would actually want to do that, so SIS#highestSelectivityIndex could just pass false.

        This would actually produce fewer trace events than we do currently, only 1 per command per replica, but I think that would actually be more correct/useful. FTR, I know that the whole process of looking up & selecting a searcher needs seriously reworking (& I'm hoping to get chance to do that very soon).

        The patch introduces quite a bit of duplication in SecondaryIndexManager between the new getHighestSelectivityIndexSearcher method and search. It would be straightforward to refactor the latter to call the former.

        The wording of the javadoc for SIS#postReconciliationProcessing could be a little clearer. In its current form, it could be taken to imply that a number of index queries may be executed by an index searcher instance on a single node & their results combined. I think it would be better to make it clear that reconcilliation happens on the coordinator and not on the replica(s) where the search happens.

        Show
        beobal Sam Tunnicliffe added a comment - Generally, I think this is ok for 2.1, modulo a couple of points: A thing that concerns me is that quite a number of AbstractRangeCommand instances are created during execution of an index query, particularly on the coordinator. The primary RangeSliceCommand is created in SelectStatement#getPageableCommand , then a PagedRangeCommand is instantiated for each page of results in RangeSliceQueryPager#queryNextPage . Then, as each of those is pushed down to StorageProxy the fan out creates another instance per sub range. The fact that each of these instances performs the selectivity calculation is a bit of a pain and somewhat wasteful, though it shouldn't be overly expensive & the alternative would involve more change than I'd be comfortable with putting into 2.1. What I do think should be addressed before including this in 2.1 is that quite a few more tracing events are now emitted as with tracing enabled, each time SIS#highestSelectivityPredicate is called it we log an event. Perhaps you could add a boolean argument to highestSelectivityPredicate to indicate whether or not the tracing event should be emitted. I think that only the call from the SIS#search implementations would actually want to do that, so SIS#highestSelectivityIndex could just pass false. This would actually produce fewer trace events than we do currently, only 1 per command per replica, but I think that would actually be more correct/useful. FTR, I know that the whole process of looking up & selecting a searcher needs seriously reworking (& I'm hoping to get chance to do that very soon). The patch introduces quite a bit of duplication in SecondaryIndexManager between the new getHighestSelectivityIndexSearcher method and search . It would be straightforward to refactor the latter to call the former. The wording of the javadoc for SIS#postReconciliationProcessing could be a little clearer. In its current form, it could be taken to imply that a number of index queries may be executed by an index searcher instance on a single node & their results combined. I think it would be better to make it clear that reconcilliation happens on the coordinator and not on the replica(s) where the search happens.
        Hide
        adelapena Andrés de la Peña added a comment -

        I have uploaded a new version of the patch with the changes that need to be addressed before including this in 2.1.

        As you suggested, I have added a boolean argument named trace to SIS#highestSelectivityPredicate to indicate whether or not the tracing event should be emitted. It is set to true by SIS#search, and false by SIS#highestSelectivityIndex.

        To avoid duplication in SecondaryIndexManager, now the search method calls to getHighestSelectivityIndexSearcher.

        I have modified SIS#postReconciliationProcessing JavaDoc trying to make it clear that it happens on the coordinator node.

        I hope you find it OK.

        Show
        adelapena Andrés de la Peña added a comment - I have uploaded a new version of the patch with the changes that need to be addressed before including this in 2.1. As you suggested, I have added a boolean argument named trace to SIS#highestSelectivityPredicate to indicate whether or not the tracing event should be emitted. It is set to true by SIS#search , and false by SIS#highestSelectivityIndex . To avoid duplication in SecondaryIndexManager , now the search method calls to getHighestSelectivityIndexSearcher . I have modified SIS#postReconciliationProcessing JavaDoc trying to make it clear that it happens on the coordinator node. I hope you find it OK.
        Hide
        beobal Sam Tunnicliffe added a comment -

        Thanks, LGTM.

        I've attached a v5 with a few minor codestyle nits fixed and I've also added an additional Tracing#isTracing call in SIS#highestSelectivityPredicate as previously we'd go to the effort of building the string args even when tracing is not enabled.

        Aleksey Yeschenko, I'm +1 if you're happy to commit this.

        Show
        beobal Sam Tunnicliffe added a comment - Thanks, LGTM. I've attached a v5 with a few minor codestyle nits fixed and I've also added an additional Tracing#isTracing call in SIS#highestSelectivityPredicate as previously we'd go to the effort of building the string args even when tracing is not enabled. Aleksey Yeschenko , I'm +1 if you're happy to commit this.
        Hide
        iamaleksey Aleksey Yeschenko added a comment -

        Committed to 2.1 as 4c7c5be798e2a7d1e72d086bc5011242ea0173dc and to trunk. Thanks Andrés de la Peña for the patch, and Sam Tunnicliffe for review.

        Show
        iamaleksey Aleksey Yeschenko added a comment - Committed to 2.1 as 4c7c5be798e2a7d1e72d086bc5011242ea0173dc and to trunk. Thanks Andrés de la Peña for the patch, and Sam Tunnicliffe for review.
        Hide
        thobbs Tyler Hobbs added a comment -

        This caused a regression in the select_distinct_with_deletions dtest, as confirmed by git bisect. If a fix for this isn't quick, can we revert the commit until it's fixed?

        Show
        thobbs Tyler Hobbs added a comment - This caused a regression in the select_distinct_with_deletions dtest , as confirmed by git bisect. If a fix for this isn't quick, can we revert the commit until it's fixed?
        Hide
        beobal Sam Tunnicliffe added a comment -

        The regression is caused by a StorageProxy#trim being incorrectly ported to AbstractRangeCommand. The additional check of ignoredTombstonedPartitions() introduced by CASSANDRA-8490 was omitted, causing a tombstoned partition in the results to be counted towards the limit. I've attached a patch to fix it & have pushed test branches to my repo:

        https://github.com/beobal/cassandra/tree/8717-follow-up-2.1
        https://github.com/beobal/cassandra/tree/8717-follow-up-trunk

        Cassci should pick these up and validate them shortly, at which point they'll appear here: http://cassci.datastax.com/view/Dev/view/beobal/

        Show
        beobal Sam Tunnicliffe added a comment - The regression is caused by a StorageProxy#trim being incorrectly ported to AbstractRangeCommand . The additional check of ignoredTombstonedPartitions() introduced by CASSANDRA-8490 was omitted, causing a tombstoned partition in the results to be counted towards the limit. I've attached a patch to fix it & have pushed test branches to my repo: https://github.com/beobal/cassandra/tree/8717-follow-up-2.1 https://github.com/beobal/cassandra/tree/8717-follow-up-trunk Cassci should pick these up and validate them shortly, at which point they'll appear here: http://cassci.datastax.com/view/Dev/view/beobal/
        Hide
        thobbs Tyler Hobbs added a comment -

        The patch looks good and the tests show that the regression has been fixed, so +1, committed to 2.1 as ce3ce44a5379f7de162b059fb99641fc4e6a344a and merged to trunk. Thanks for the quick fix!

        Show
        thobbs Tyler Hobbs added a comment - The patch looks good and the tests show that the regression has been fixed, so +1, committed to 2.1 as ce3ce44a5379f7de162b059fb99641fc4e6a344a and merged to trunk. Thanks for the quick fix!

          People

          • Assignee:
            adelapena Andrés de la Peña
            Reporter:
            adelapena Andrés de la Peña
            Reviewer:
            Sam Tunnicliffe
          • Votes:
            10 Vote for this issue
            Watchers:
            18 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development