Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-309

Support ARRAY data type

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Task
    • Status: Resolved
    • Resolution: Fixed
    • 3.0-Release
    • None
    • None
    • 178


      Benchmark outline from Sudarshan Kadambi which describes a good use case for supporting an ARRAY data type. Supporting an ARRAY data type, an UNNEST built-in function, plus derived tables would allow the following standard query to be used (as opposed to creating custom aggregate functions):

      select avg(v)
      from (select unnest(value) v from t
      where object_id in (O1,O2,...O250K) and field_type = 'F1' and attrib_id = 'A1')

      It'd be nice if you could just do an average over an array directly, but this would be non standard.

      On 04/26/2013 11:17 AM, Sudarshan Kadambi (BLOOMBERG) wrote:
      Hi James:

      Yes, I saw the email. Thank you for this generous offer. I wanted some time to make sure the benchmark correctly represents my use case.

      If you wish, here's a benchmark setup you could use:
      1. 1 Billion keys in table
      2. Query contains 250K randomly chosen object_ids and a single field_type (F1)
      3. Aggregation done on a single attribute (A1)
      4. Query:
      select avg(value) where object_id in (O1,O2,...O250K) and field_type = 'F1' and attrib_id = 'A1'

      We would want the test done with and without the skip scan filter for the purpose of comparison.

      The reason why I wanted some time to think about it is that the values within each attribute is a JSON number array. So an avg across 2 values is a average of the averages.

      For e.g. Object_id=O1, Field_type=F1, Attrib_id=A1, Value 1:


      Object_id=O2, Field_type=F1, Attrib_id=A1, Value 2:


      The query should produce: Avg{Avg

      {Value1}, Avg{Value2}} = Avg{3,3} = 3
      If we were doing a sum, the query would produce: Sum{Avg{Value1}

      , Avg


      = Sum


      = 6.

      This might require customization of the aggregate function code.



          This comment will be Viewable by All Users Viewable by All Users


            ramkrishna Ramkrishna S. Vasudevan
            jamestaylor James R. Taylor
            0 Vote for this issue
            1 Start watching this issue




                Issue deployment