Uploaded image for project: 'Chukwa'
  1. Chukwa
  2. CHUKWA-278

Improve post process manager and metric data loader to support data loading from pig aggregation

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.0
    • Component/s: Data Processors
    • Labels:
      None
    • Environment:

      Redhat EL 5.1, Java 6

      Description

      The current post process manager loads sequence file to database only. Ideally, the post process manager should be able to load data into different data loader. Hence, DataLoaderFactory should be defined as an interface or abstract class for third party to implement their own data loader. The current implementation of pig aggregator generates down sampled data as different record type. There should be a new capability for Metric Data Loader to identify table partition base on record type.

      1. CHUKWA-278.patch
        64 kB
        Eric Yang
      2. CHUKWA-278-1.patch
        66 kB
        Eric Yang
      3. CHUKWA-278-2.patch
        74 kB
        Eric Yang
      4. CHUKWA-278-3.patch
        60 kB
        Eric Yang

        Activity

        Hide
        eyang Eric Yang added a comment -
        • Add new DataLoaderFactory.
        • Added ability to determine time partition base on record type.
        Show
        eyang Eric Yang added a comment - Add new DataLoaderFactory. Added ability to determine time partition base on record type.
        Hide
        asrabkin Ari Rabkin added a comment -

        I don't see a DataLoaderFactory class in this patch...

        Show
        asrabkin Ari Rabkin added a comment - I don't see a DataLoaderFactory class in this patch...
        Hide
        eyang Eric Yang added a comment -

        Updated patch with DataLoaderFactory and MetricDataLoader.

        Show
        eyang Eric Yang added a comment - Updated patch with DataLoaderFactory and MetricDataLoader.
        Hide
        jboulon Jerome Boulon added a comment -

        -1
        Could you use standard JDBC to get metadata information, MetricDataLoader should be compatible with all database vendor. This will simplify the testing side (could use Derby instead of Mysql) and open some new opportunities like Oracle support for example.

        • things like "String query = "select * from " + table + "_template limit 1" " prevent usage of any other database vendor.
        Show
        jboulon Jerome Boulon added a comment - -1 Could you use standard JDBC to get metadata information, MetricDataLoader should be compatible with all database vendor. This will simplify the testing side (could use Derby instead of Mysql) and open some new opportunities like Oracle support for example. things like "String query = "select * from " + table + "_template limit 1" " prevent usage of any other database vendor.
        Hide
        eyang Eric Yang added a comment -

        Updated meta data collection for column type to use JDBC interface.

        Show
        eyang Eric Yang added a comment - Updated meta data collection for column type to use JDBC interface.
        Hide
        macyang Mac Yang added a comment -

        Took a quick look, my two cents,

        1. It would be better to get column by name instead of by index
        + String name = rs.getString(4);
        + int type = rs.getInt(5);

        2. The following pattern compilation should be outside of the while loop
        + Pattern p = Pattern.compile("(.*)_
        d+");

        Show
        macyang Mac Yang added a comment - Took a quick look, my two cents, 1. It would be better to get column by name instead of by index + String name = rs.getString(4); + int type = rs.getInt(5); 2. The following pattern compilation should be outside of the while loop + Pattern p = Pattern.compile("(.*)_ d+");
        Hide
        eyang Eric Yang added a comment -

        Updated patch with Pattern outside of while loop, and use ColumnName to determine the table schema.

        Show
        eyang Eric Yang added a comment - Updated patch with Pattern outside of while loop, and use ColumnName to determine the table schema.
        Hide
        terencekwan Terence Kwan added a comment -

        look good +1

        Show
        terencekwan Terence Kwan added a comment - look good +1
        Hide
        eyang Eric Yang added a comment -

        I just committed this, thanks Terence.

        Show
        eyang Eric Yang added a comment - I just committed this, thanks Terence.
        Hide
        hudson Hudson added a comment -

        Integrated in Chukwa-trunk #49 (See http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/49/)
        . Improve post process manager and metric data loader to support data loading from pig aggregation. (Eric Yang)

        Show
        hudson Hudson added a comment - Integrated in Chukwa-trunk #49 (See http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/49/ ) . Improve post process manager and metric data loader to support data loading from pig aggregation. (Eric Yang)

          People

          • Assignee:
            eyang Eric Yang
            Reporter:
            eyang Eric Yang
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development