CouchDB has hard-coded functionality for grouping. From the user's point of view: group_level=N will truncate Array keys to the first N elements, and that's it.
It would be wonderful if application-specific grouping functions could be added. Useful examples include:
- for string keys, truncate to the first N characters (e.g. group by first 3 letters of surname)
- for numeric keys, trunc(k/N) (e.g. divide by 100 would give you buckets of 0..99, 100..199, 200..299 etc)
- combine with group_level: e.g. truncate array to first two elements plus the third element divided by 100
["string1","string2",Number,"rest"] => ["string1","string2",trunc(Number/100)]
- for numeric keys: use trunc(log(V) * N) for exponential buckets
- for hexadecimal-string keys: right-shift N places
In each case N would be a parameter chosen at query time, like group_level is now.
It would be sufficient just to have a hook to statically link Erlang functions to do this. There would then need to be two new HTTP parameters: one to choose the grouping function and one for any arguments it needs.
Note: group truncation functions would have need to meet certain constraints to work with grouping logic. Something like:
K1 <= K2 implies grouptrunc(K1) <= grouptrunc(K2)
It's not implemented exactly like that. As far as I can see, there's one function to compare keys for equality by looking at the first N elements (GroupRowsFun), and another function truncates them when emitting them (RespFun). For adding bolt-on functions it would be more convenient just to define a single group key truncation function.