Uploaded image for project: 'Metron (Retired)'
  1. Metron (Retired)
  2. METRON-392

Allow User to Define Custom 'Group By' for a Profile



    • Improvement
    • Status: Done
    • Major
    • Resolution: Done
    • None
    • 0.2.1BETA


      When creating models using Profile data, models are most often going to be trained and scored not with all of the Profile data, but only subsets or segments of the data. For example, Mondays often look very different than Sundays. When training and scoring a Monday, the model will only use data from previous Mondays.

      The current Profiler implementation embeds the day of week, week of month, month, and year in the row key before storing the data in HBase. This is intended to sort the data to allow for a contiguous scan when training on subsets of the data. For example, a read that should pull in data from Mondays only.

      The problem with this approach is that properly segmenting the data for the specific problem at hand is as important to building an effective model as feature selection. Segmenting on day of week, week of month, etc will not be applicable for many models built by a user.

      In addition, there will not be one way in which the data needs to be segmented that applies for all Profiles. Each Profile is likely to have different ways in which the data needs to be segmented.

      It will also be the case that users will need to segment the data by elements that only make sense in their specific environment. For example, a company will have its own holiday calendar or have specific 'end-of-month' processing days that need to be taken into account. A user needs to be able to apply these custom elements in how the data is segmented.

      This change will allow a user to customize as part of a Profile definition how the data should be grouped when stored in HBase.


        Issue Links



              nickwallen Nick Allen
              nickwallen Nick Allen
              0 Vote for this issue
              2 Start watching this issue