Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-8717

Top-k queries with custom secondary indexes

    Details

      Description

      As presented in Cassandra Summit Europe 2014, secondary indexes can be modified to support general top-k queries with minimum changes in Cassandra codebase. This way, custom 2i implementations could provide relevance search, sorting by columns, etc.

      Top-k queries retrieve the k best results for a certain query. That implies querying the k best rows in each token range and then sort them in order to obtain the k globally best rows.

      For doing that, we propose two additional methods in class SecondaryIndexSearcher:

      public boolean requiresFullScan(List<IndexExpression> clause)
      {
          return false;
      }
      
      public List<Row> sort(List<IndexExpression> clause, List<Row> rows)
      {
          return rows;
      }
      

      The first one indicates if a query performed in the index requires querying all the nodes in the ring. It is necessary in top-k queries because we do not know which node are the best results. The second method specifies how to sort all the partial node results according to the query.

      Then we add two similar methods to the class AbstractRangeCommand:

          this.searcher = Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
      
      public boolean requiresFullScan() {
          return searcher == null ? false : searcher.requiresFullScan(rowFilter);
      }
      
      public List<Row> combine(List<Row> rows)
      {
          return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, rows));
      }
      

      Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as shown in the attached patch.

      We think that the proposed approach provides very useful functionality with minimum impact in current codebase.

        Attachments

        1. 8717-v5.txt
          17 kB
          Sam Tunnicliffe
        2. 8717-follow-up-2.1.txt
          1.0 kB
          Sam Tunnicliffe
        3. 0004-Add-support-for-top-k-queries-in-2i.patch
          18 kB
          Andrés de la Peña
        4. 0003-Add-support-for-top-k-queries-in-2i.patch
          11 kB
          Andrés de la Peña
        5. 0002-Add-support-for-top-k-queries-in-2i.patch
          11 kB
          Andrés de la Peña
        6. 0001-Add-support-for-top-k-queries-in-2i.patch
          7 kB
          Andrés de la Peña

          Activity

            People

            • Assignee:
              adelapena Andrés de la Peña
              Reporter:
              adelapena Andrés de la Peña
              Reviewer:
              Sam Tunnicliffe
            • Votes:
              10 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: