Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-12674

[SASI] Confusing AND/OR semantics for StandardAnalyzer



    • Type: Bug
    • Status: Open
    • Priority: Normal
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Feature/SASI
    • Labels:
    • Environment:

      Cassandra 3.7

    • Severity:
    • Since Version:


      Connected to Test Cluster at
      [cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]
      Use HELP for help.
      cqlsh> use test;
      cqlsh:test> CREATE TABLE sasi_bug(id int, clustering int, val text, PRIMARY KEY((id), clustering));
      cqlsh:test> CREATE CUSTOM INDEX ON sasi_bug(val) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
          'mode': 'CONTAINS',
           'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
          'analyzed': 'true'};
      //1st example SAME PARTITION KEY
      cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 1, 'homeworker');
      cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 2, 'hardworker');
      cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%work home%';
       id | clustering | val
        1 |          1 | homeworker
        1 |          2 | hardworker
      (2 rows)
      //2nd example DIFFERENT PARTITION KEY
      cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(10, 1, 'speedrun');
      cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(11, 1, 'longrun');
      cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%long run%';
       id | clustering | val
       11 |          1 | longrun
      (1 rows)

      In the 1st example, both rows belong to the same partition so SASI returns both values. Indeed LIKE '%work home%' means contains 'work' OR 'home' so the result makes sense

      In the 2nd example, only one row is returned whereas we expect 2 rows because LIKE '%long run%' means contains 'long' OR 'run' so speedrun should be returned too.

      So where is the problem ? Explanation:

      When there is only 1 predicate, the root operation type is an AND:

          private Operation analyze()
                  Operation.Builder and = new Operation.Builder(OperationType.AND, controller);
                  return and.complete();

      During the parsing of LIKE '%long run%', SASI creates 2 expressions for the searched term: long and run, which corresponds to an OR logic. However, this piece of code just ruins the OR logic:

              public Operation complete()
                  if (!expressions.isEmpty())
                      ListMultimap<ColumnDefinition, Expression> analyzedExpressions = analyzeGroup(controller, op, expressions);
                      RangeIterator.Builder<Long, Token> range = controller.getIndexes(op, analyzedExpressions.values());

      As you can see, we blindly take all the values of the MultiMap (which contains a single entry for the val column with 2 expressions) and pass it to controller.getIndexes(...)

          public RangeIterator.Builder<Long, Token> getIndexes(OperationType op, Collection<Expression> expressions)
              if (resources.containsKey(expressions))
                  throw new IllegalArgumentException("Can't process the same expressions multiple times.");
              RangeIterator.Builder<Long, Token> builder = op == OperationType.OR
                                                      ? RangeUnionIterator.<Long, Token>builder()
                                                      : RangeIntersectionIterator.<Long, Token>builder();

      And because the root operation has AND type, the RangeIntersectionIterator will be used on both expressions long and run.

      So when data belong to different partitions, we have the AND logic that applies and eliminates speedrun

      When data belong to the same partition but different row, the RangeIntersectionIterator returns a single partition and then the rows are filtered further by operationTree.satisfiedBy and the results are correct

                  while (currentKeys.hasNext())
                          DecoratedKey key = currentKeys.next();
                          if (!keyRange.right.isMinimum() && keyRange.right.compareTo(key) < 0)
                              return endOfData();
                          try (UnfilteredRowIterator partition = controller.getPartition(key, executionController))
                              Row staticRow = partition.staticRow();
                              List<Unfiltered> clusters = new ArrayList<>();
                              while (partition.hasNext())
                                  Unfiltered row = partition.next();
                                  if (operationTree.satisfiedBy(row, staticRow, true))

      /cc Pavel Yaskevich Alex Petrov


          Issue Links



              • Assignee:
                ifesdjeen Alex Petrov
                doanduyhai DuyHai Doan
                Alex Petrov
              • Votes:
                1 Vote for this issue
                14 Start watching this issue


                • Created: