Details
-
New Feature
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
Description
Story
As a data scientist, I want to perform session reconstruction on my data set, so that I can prepare for input into other algorithms like path functions, or predictive analytics algorithms.
This is a follow on to
https://issues.apache.org/jira/browse/MADLIB-909
https://issues.apache.org/jira/browse/MADLIB-1001
to add minimum time.
Details
Add min time to the existing params:
Proposed interface changes:
sessionize (
source_table,
output_table,
partition_expr,
order_expr,
time_stamp,
time_out,
min_time, -- new
output_cols,
create_view
)
where
min_time (optional)
Minimum delta time that must elapse for an event to be considered a valid event (default=0). If an event happens in less than min_time since the last valid event, it does not get included in the current session and is dropped. Same units as time_stamp.
Implementation notes
1) Should be specified in the same units as the time_out parameter.
2) Always compare against the last valid session event, not against one(s) that just got dropped.
For an example of how min_time could work, see Aster Analytics sessionization function [1].
References
[1] Aster Analytics users guide, see "sessionize" function
http://www.info.teradata.com/edownload.cfm?itemid=143450001
http://www.info.teradata.com/templates/eSrchResults.cfm?txtpid=&txtrelno=&prodline=all&frmdt=&txtsrchstring=aster%20analytics&srtord=Desc&todt=&rdSort=Date
https://www.youtube.com/watch?v=C760M9ttK9Q
Attachments
Issue Links
- duplicates
-
MADLIB-1028 add rapid_fire to sessionization
- Closed
- links to