Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-309

Support ARRAY data type

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Resolution: Fixed
    • 3.0-Release
    • None
    • None
    • 178

    Description

      Benchmark outline from Sudarshan Kadambi which describes a good use case for supporting an ARRAY data type. Supporting an ARRAY data type, an UNNEST built-in function, plus derived tables would allow the following standard query to be used (as opposed to creating custom aggregate functions):

      select avg(v)
      from (select unnest(value) v from t
      where object_id in (O1,O2,...O250K) and field_type = 'F1' and attrib_id = 'A1')

      It'd be nice if you could just do an average over an array directly, but this would be non standard.

      On 04/26/2013 11:17 AM, Sudarshan Kadambi (BLOOMBERG) wrote:
      Hi James:

      Yes, I saw the email. Thank you for this generous offer. I wanted some time to make sure the benchmark correctly represents my use case.

      If you wish, here's a benchmark setup you could use:
      1. 1 Billion keys in table
      2. Query contains 250K randomly chosen object_ids and a single field_type (F1)
      3. Aggregation done on a single attribute (A1)
      4. Query:
      select avg(value) where object_id in (O1,O2,...O250K) and field_type = 'F1' and attrib_id = 'A1'

      We would want the test done with and without the skip scan filter for the purpose of comparison.

      The reason why I wanted some time to think about it is that the values within each attribute is a JSON number array. So an avg across 2 values is a average of the averages.

      For e.g. Object_id=O1, Field_type=F1, Attrib_id=A1, Value 1:

      {1,2,3,4,5,}

      Object_id=O2, Field_type=F1, Attrib_id=A1, Value 2:

      {1,2,3,4,5}

      The query should produce: Avg{Avg

      {Value1}, Avg{Value2}} = Avg{3,3} = 3
      If we were doing a sum, the query would produce: Sum{Avg{Value1}

      , Avg

      {Value2}

      = Sum

      {3,3}

      = 6.

      This might require customization of the aggregate function code.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ramkrishna Ramkrishna S. Vasudevan
            jamestaylor James R. Taylor
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment