Uploaded image for project: 'Apache Gora'
  1. Apache Gora
  2. GORA-413

Support creation of dynamic columns within Gora datastore mapping designs

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.6
    • 1.0
    • gora-hbase
    • None

    Description

      The conversation taking place on dynamically generating HBase columns has raised an issue that new functionality needs to be added in order to achieve this.
      The main driver for this issue coming to light is that Chukwa logs need to dynamically create many many columns over time directly dependent on the number of data chunks we get. Each data chunk has a [Sequence ID], this sequenceID should be the column name.

      The table design will look like this

      Row Key: [Invert Date]:[Data Type]:[Primary Key]
      Column Family: log
      Column Name: [Sequence ID]
      Timestamp: [log entry timestamp]
      
      Example:
      
      Row Key: 2132013102:TT:host1.example.com
      Column Family: log
      Column Name: 1230
      Cell Value: 2013-01-23 12:01:30 INFO This is a log entry.
      Timestamp: 1358942490
      

      The inverted date allow the table to be partitioned by hour or day of the month or month more easily.
      The usage of column name for consecutive sequence to allow fast retrieval in a linear scan. This format is typically good for retrieve a hour worth of logs fast for a node. Hence, if we are doing batch scanning of the table in a rolling window via map reduce job at every hour interval, we get a even spread the work load to multiple map reduce tasks.

      Attachments

        Activity

          People

            Unassigned Unassigned
            lewismc Lewis John McGibbney
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: