Details
-
Task
-
Status: Resolved
-
Resolution: Fixed
-
3.0-Release
-
None
-
None
-
178
Description
Benchmark outline from Sudarshan Kadambi which describes a good use case for supporting an ARRAY data type. Supporting an ARRAY data type, an UNNEST built-in function, plus derived tables would allow the following standard query to be used (as opposed to creating custom aggregate functions):
select avg(v)
from (select unnest(value) v from t
where object_id in (O1,O2,...O250K) and field_type = 'F1' and attrib_id = 'A1')
It'd be nice if you could just do an average over an array directly, but this would be non standard.
On 04/26/2013 11:17 AM, Sudarshan Kadambi (BLOOMBERG) wrote:
Hi James:
Yes, I saw the email. Thank you for this generous offer. I wanted some time to make sure the benchmark correctly represents my use case.
If you wish, here's a benchmark setup you could use:
1. 1 Billion keys in table
2. Query contains 250K randomly chosen object_ids and a single field_type (F1)
3. Aggregation done on a single attribute (A1)
4. Query:
select avg(value) where object_id in (O1,O2,...O250K) and field_type = 'F1' and attrib_id = 'A1'
We would want the test done with and without the skip scan filter for the purpose of comparison.
The reason why I wanted some time to think about it is that the values within each attribute is a JSON number array. So an avg across 2 values is a average of the averages.
For e.g. Object_id=O1, Field_type=F1, Attrib_id=A1, Value 1:
{1,2,3,4,5,}Object_id=O2, Field_type=F1, Attrib_id=A1, Value 2:
{1,2,3,4,5}The query should produce: Avg{Avg
{Value1}, Avg{Value2}} = Avg{3,3} = 3If we were doing a sum, the query would produce: Sum{Avg{Value1}
, Avg
{Value2}= Sum
{3,3}= 6.
This might require customization of the aggregate function code.