Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1237

Session windows for streaming SQL

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • stream
    • None

    Description

      A session window is a collection of rows whose key values, when sorted, have a gap of at most N.

      Q1. Should "at most" be "less than"?

      The key type can be any type that has a minus operator, that is, numeric and date-time.

      I propose the following syntax: session(key [, ...]*, interval). For example:

      select stream session(rowtime, productId, interval '5' second),
        productId, count(*) as c
      from Orders
      group by session(rowtime, productId, interval '5' second),
        productId
      

      to find bursts of orders for the same product where consecutive orders are no more than 5 seconds apart.

      The first key column rowtime defines the session and must be of numeric/date-time type, and must have monotonicity or similar in order for the query to make progress; the other key columns (in this case productId) can be of any type; the last column is the interval, and must be constant.

      The session function returns the key value at the start of the window. Unlike the hop function, each row belongs to precisely one window. But session is not a true function, because its value depends on the records flowing in the stream.

      Q2. If session is used, should we allow order-dependent aggregate functions such as first_value?

      Q3. Should we allow session as a windowed aggregate function?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              julianhyde Julian Hyde
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: