Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1001

Sessionization - Phase 2 (output controls)

    XMLWordPrintableJSON

Details

    Description

      Story

      As a data scientist, I want to perform session reconstruction on my data set, so that I can prepare for input into other algorithms like path functions, or predictive analytics algorithms.

      This is a follow on to
      https://issues.apache.org/jira/browse/MADLIB-909
      to add optional output controls.

      Details

      Proposed interface changes:

      sessionize (
         source_table,
         output_table,
         partition_expr,
         time_stamp,
         max_time,
         output_cols -- new
         create_view   -- new
         )
      

      where

      output_cols (optional)
      TEXT.
      asterisk (i.e., '*') – ALL columns in input table + session column (default)
      'x, y, z, ...' – list of columns you want + session column. This list could include the partition expression or other expressions as desired. This should also support '*, expr1, expr2, etc.' where this means output all columns + the extra expressions listed. Needs to a valid SELECT expression.

      For example, in the path function http://madlib.incubator.apache.org/docs/latest/group__grp__path.html#examples
      we do a similar thing for the aggregate function parameter.

      create_view (optional)
      BOOLEAN default: TRUE. Determines whether to create a view or materialize a table as output. If you only needed session info once, creating a view could be significantly faster than materializing as a table.

      Attachments

        1. MADlib_ Sessionize_user_doc_v2.pdf
          167 kB
          Frank McQuillan

        Activity

          People

            njayaram Nandish Jayaram
            fmcquillan Frank McQuillan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: