> This isn't possible the "pretend index is a supercolumn row" approach.
I'm not sure that I understand why... can you give an example? The key in the pseudo CF would be the original indexed value, and each top level column in the index row would be a row from the base (from one node), so filtering within the base row could be applied locally on each node.
> multiget(rowpredicate, columnpredicate)*
The rowpredicate containing an "index scan" parameter is very interesting, and does clarify slow operations. But, I can easily image a situation where someone wanted to use both a "named keys" and "index scan" rowpredicate at once, which would still be very efficient, but which would require a list<rowpredicate>.
I agree that placing the "index scan" predicate in the first position in the method call is essential, which is why I suggested the pseudo-CF api:
An interesting parallel is to compare the proposed api to Python's array slicing syntax, which is extremely elegant. I imagine that our ideal API is one that allows either named keys or a key range at every level of nesting. The following paragraphs only refer to key/name slicing, and don't go into 'value' queries.
As long as you concretely define a key or range of keys to search for at each level (such as [key1:key5][name1:name2][subname5]), your operation can run in bounded time. But, to provide for more flexibility, the get_range_slices method in the current API allows something like: [ ? ][name5] The question mark represents an unbounded level, which may mean a full table scan without finding 'subname5' (very dangerous, not scalable). This is one of the places where we need secondary indexes: we want columns containing any value for subname5 bunched together into an index.
Comparing to the Python array API highlights the fact that prefix searches are always safe, and that by always having a parent predicate, you achieve bounded time operations. This is why placing the "index scan" predicate in the first position is so clear.
This brings us back to the pseudo-CF api: why have 3 types of rowpredicates, and 2+ types of columnpredicates when, by asking users to define views that shuffle their data into a form that allows for prefix queries, we can do something like:
... with a predicate (key range or key list) required for every level, and only the last level allowing an unbounded predicate.
With this API, the "named keys" + "index scan" query I pointed out above would look like (with an indexed 'age' column):
multiget( [ predicate(key is 27), predicate(name in [ben, george]), predicate(subname is any) ] )