Uploaded image for project: 'Chukwa'
  1. Chukwa
  2. CHUKWA-734

Gora Storage System for Chuckwa Logs

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.6.0
    • Fix Version/s: None
    • Component/s: Data Collection
    • Labels:
      None

      Description

      I would like to build a Gora-backed log-to-datastore module for Chuckwa. I am going to work on this today.
      Gora is an in-memory data modeling and storage abstraction
      http://gora.apache.org
      Gora powers the Apache Nutch 2.X software which generates a bunch of log data. Having a Chuckwa monitoring tool for Nutch would be grand.

      1. CHUKWA-734v3.patch
        55 kB
        Eric Yang
      2. CHUKWA-734v2.patch
        55 kB
        Lewis John McGibbney
      3. CHUKWA-734.patch
        39 kB
        Lewis John McGibbney

        Activity

        Hide
        eyang Eric Yang added a comment -

        This would be something really great to have. My recommendation is to write a Gora Writer class which extends PipelineWriter. Timestamp or time partition are primary element of a log file, however, it is not a good idea to store monotonic increasing sequence row key in hbase or any of the Big table style database. What would you recommend to be design for primary key and how it could ensure HBase region server are spread evenly? We have another JIRA, CHUKWA-667 which talks about the design of row key. I am not satisfied with the row key design that I outlined. Having Gora in the mix may enable some interesting optimization.

        Show
        eyang Eric Yang added a comment - This would be something really great to have. My recommendation is to write a Gora Writer class which extends PipelineWriter. Timestamp or time partition are primary element of a log file, however, it is not a good idea to store monotonic increasing sequence row key in hbase or any of the Big table style database. What would you recommend to be design for primary key and how it could ensure HBase region server are spread evenly? We have another JIRA, CHUKWA-667 which talks about the design of row key. I am not satisfied with the row key design that I outlined. Having Gora in the mix may enable some interesting optimization.
        Hide
        lewismc Lewis John McGibbney added a comment -

        We just released Apache Gora 0.6 today so i thought I would put this together with the aim of building upon the initial patch.

        Initial patch which contains

        • implementing the GoraWriter, I've added as much documentation as I see it
        • building a Gora implementation of the Chukwa Chunk e.g data and metadata
        • implementation of an HBase mapping (gora-hbase-mapping.xml)
        • addition of gora.properties file
        • definition of gora-hbase dependency as well as the required gora-hadoop-X dependencies within pom.xml

        What you need to do to get it working

        • uncomment the gora-hbase dependency within pom.xml
        • use GoraWriter as the writer ikplementation within agent-conf (please see patch) for addition to this file
        • mvn install

        What I would like from you guys

        • try giving it a spin and see if you can use it... if you can't then I would very much appreciate the feedback.

        Some notes

        • HBase support in Gora 0.6 is 0.98.8-hadoop2
        • Hadoop support is 1.2.1 and 2.5.2
        • We use Avro for serialization, hence everything will be in HBase as Avro serialized data.

        Some things Eric Yang and myself still need to sort out

        • What does primary key look like?

        Next steps

        • I get feedback on this
        • I think about primary key support
        • I write some tests using Gora's MemStore to simulate mapping Chukwa chunk data to a Gora datastore.
        Show
        lewismc Lewis John McGibbney added a comment - We just released Apache Gora 0.6 today so i thought I would put this together with the aim of building upon the initial patch. Initial patch which contains implementing the GoraWriter, I've added as much documentation as I see it building a Gora implementation of the Chukwa Chunk e.g data and metadata implementation of an HBase mapping (gora-hbase-mapping.xml) addition of gora.properties file definition of gora-hbase dependency as well as the required gora-hadoop-X dependencies within pom.xml What you need to do to get it working uncomment the gora-hbase dependency within pom.xml use GoraWriter as the writer ikplementation within agent-conf (please see patch) for addition to this file mvn install What I would like from you guys try giving it a spin and see if you can use it... if you can't then I would very much appreciate the feedback. Some notes HBase support in Gora 0.6 is 0.98.8-hadoop2 Hadoop support is 1.2.1 and 2.5.2 We use Avro for serialization, hence everything will be in HBase as Avro serialized data. Some things Eric Yang and myself still need to sort out What does primary key look like? Next steps I get feedback on this I think about primary key support I write some tests using Gora's MemStore to simulate mapping Chukwa chunk data to a Gora datastore.
        Hide
        eyang Eric Yang added a comment -

        I got an error for running TestHBaseWriter unit test:

        -------------------------------------------------------------------------------
        Test set: org.apache.hadoop.chukwa.datacollection.writer.TestHBaseWriter
        -------------------------------------------------------------------------------
        Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.707 sec <<< FAILURE!
        testWriters(org.apache.hadoop.chukwa.datacollection.writer.TestHBaseWriter) Time elapsed: 0.582 sec <<< ERROR!
        java.lang.IncompatibleClassChangeError: Implementing class
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:190)
        at org.apache.hadoop.hbase.mapreduce.MapreduceTestingShim.<clinit>(MapreduceTestingShim.java:45)
        at org.apache.hadoop.hbase.HBaseTestingUtility.createDirsAndSetProperties(HBaseTestingUtility.java:606)
        at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:535)
        at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:880)
        at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:805)
        at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:776)
        at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:745)
        at org.apache.hadoop.chukwa.datacollection.writer.TestHBaseWriter.setUp(TestHBaseWriter.java:67)
        at junit.framework.TestCase.runBare(TestCase.java:132)
        at junit.framework.TestResult$1.protect(TestResult.java:110)
        at junit.framework.TestResult.runProtected(TestResult.java:128)
        at junit.framework.TestResult.run(TestResult.java:113)
        at junit.framework.TestCase.run(TestCase.java:124)
        at junit.framework.TestSuite.runTest(TestSuite.java:243)
        at junit.framework.TestSuite.run(TestSuite.java:238)
        at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
        at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
        at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
        at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
        at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
        at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
        at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:81)
        at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)

        This exception happens when using Hadoop 1.2.1 + HBase 0.98.8 + Hadoop-compat1. Does Gora support Hadoop1?
        We probably need to setup another profile for enabling Hadoop 1 vs Hadoop 2.

        For table schema design and row key design, maybe we can use something like this:

        Row Key: [Invert Date]:[Data Type]:[Primary Key]
        Column Family: log
        Column Name: [Sequence ID]
        Timestamp: [log entry timestamp]

        Example:

        Row Key: 2132013102:TT:host1.example.com
        Column Family: log
        Column Name: 1230
        Cell Value: 2013-01-23 12:01:30 INFO This is a log entry.
        Timestamp: 1358942490

        The inverted date allow the table to be partitioned by hour or day of the month or month more easily.
        The usage of column name for consecutive sequence to allow fast retrieval in a linear scan. This format is typically good for retrieve a hour worth of logs fast for a node. Hence, if we are doing batch scanning of the table in a rolling window via map reduce job at every hour interval, we get a even spread the work load to multiple map reduce tasks.
        Can Gora map sequence ID value to column name in HBase?

        Show
        eyang Eric Yang added a comment - I got an error for running TestHBaseWriter unit test: ------------------------------------------------------------------------------- Test set: org.apache.hadoop.chukwa.datacollection.writer.TestHBaseWriter ------------------------------------------------------------------------------- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.707 sec <<< FAILURE! testWriters(org.apache.hadoop.chukwa.datacollection.writer.TestHBaseWriter) Time elapsed: 0.582 sec <<< ERROR! java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:792) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.hadoop.hbase.mapreduce.MapreduceTestingShim.<clinit>(MapreduceTestingShim.java:45) at org.apache.hadoop.hbase.HBaseTestingUtility.createDirsAndSetProperties(HBaseTestingUtility.java:606) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:535) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:880) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:805) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:776) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:745) at org.apache.hadoop.chukwa.datacollection.writer.TestHBaseWriter.setUp(TestHBaseWriter.java:67) at junit.framework.TestCase.runBare(TestCase.java:132) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110) at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175) at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:81) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68) This exception happens when using Hadoop 1.2.1 + HBase 0.98.8 + Hadoop-compat1. Does Gora support Hadoop1? We probably need to setup another profile for enabling Hadoop 1 vs Hadoop 2. For table schema design and row key design, maybe we can use something like this: Row Key: [Invert Date] : [Data Type] : [Primary Key] Column Family: log Column Name: [Sequence ID] Timestamp: [log entry timestamp] Example: Row Key: 2132013102:TT:host1.example.com Column Family: log Column Name: 1230 Cell Value: 2013-01-23 12:01:30 INFO This is a log entry. Timestamp: 1358942490 The inverted date allow the table to be partitioned by hour or day of the month or month more easily. The usage of column name for consecutive sequence to allow fast retrieval in a linear scan. This format is typically good for retrieve a hour worth of logs fast for a node. Hence, if we are doing batch scanning of the table in a rolling window via map reduce job at every hour interval, we get a even spread the work load to multiple map reduce tasks. Can Gora map sequence ID value to column name in HBase?
        Hide
        lewismc Lewis John McGibbney added a comment -

        Hi Eric Yang

        I got an error for running TestHBaseWriter unit test:

        My tests hang on

        Running org.apache.hadoop.chukwa.datacollection.sender.TestAcksOnFailure

        , however when I run just the HBaseWriter test, I also get an error

          5 testWriters(org.apache.hadoop.chukwa.datacollection.writer.TestHBaseWriter)  Time elapsed: 0.026 sec  <<< ERROR!
          6 java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/Stoppable
          7         at java.lang.ClassLoader.defineClass1(Native Method)
          8         at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
          9         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
         10         at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
         11         at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
         12         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
         13         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
         14         at java.security.AccessController.doPrivileged(Native Method)
         15         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
         16         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
         17         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
         18         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
         19         at java.lang.ClassLoader.defineClass1(Native Method)
         20         at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
         21         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
         22         at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
         23         at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
         24         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
         25         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
         26         at java.security.AccessController.doPrivileged(Native Method)
         27         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
         28         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
         29         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
         30         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
         31         at java.lang.ClassLoader.defineClass1(Native Method)
         32         at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
         33         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
         34         at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
         35         at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
         36         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
         37         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
         38         at java.security.AccessController.doPrivileged(Native Method)
         39         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
         40         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
         41         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
         42         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
         43         at org.apache.hadoop.chukwa.datacollection.writer.TestHBaseWriter.setUp(TestHBaseWriter.java:65)
         44         at junit.framework.TestCase.runBare(TestCase.java:132)
         45         at junit.framework.TestResult$1.protect(TestResult.java:110)
         46         at junit.framework.TestResult.runProtected(TestResult.java:128)
         47         at junit.framework.TestResult.run(TestResult.java:113)
         48         at junit.framework.TestCase.run(TestCase.java:124)
         49         at junit.framework.TestSuite.runTest(TestSuite.java:243)
         50         at junit.framework.TestSuite.run(TestSuite.java:238)
         51         at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
         52         at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
         53         at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
         54         at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
         55         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        

        Does Gora support Hadoop1?

        Yes we have shim layer support for Hadoop 1.2.1 and 2.5.2. We just need to configure it properly.

        We probably need to setup another profile for enabling Hadoop 1 vs Hadoop 2.

        Most likely Eric. I will be looking in to this. I see that there are existing profiles for 0.96.2-hadoop1 and 0.94.9
        I am therefore thinking that we could potentially set up profiles for Gora-supported backends for HBase 0.98.8-hadoop2, Cassandra 2.0.2, Solr 4.10.3, Mongodb 2.6 and Accumulo 1.5.1. Before I do this right now however I need clarification on

        • what version of HBase is currently activated by default in Gora?
        • how and where is this defined as default within pom.xml?

        Can Gora map sequence ID value to column name in HBase?

        Mmmmmm.... I am not sure about this. Reasoning is as follows: currently we define AHEAD OF MAPPING

        It would appear to me that for us to be able to map sequenceID to a column name, we would 1) want to dynamically create many many columns over time directly dependent on the number of data chunks we get, is this correct? 2) once we get a new incoming data chunk we would wish to dynamically generate a new column within the existing table with the sequenceID as the column name, is this correct?

        Thanks

        Show
        lewismc Lewis John McGibbney added a comment - Hi Eric Yang I got an error for running TestHBaseWriter unit test: My tests hang on Running org.apache.hadoop.chukwa.datacollection.sender.TestAcksOnFailure , however when I run just the HBaseWriter test, I also get an error 5 testWriters(org.apache.hadoop.chukwa.datacollection.writer.TestHBaseWriter) Time elapsed: 0.026 sec <<< ERROR! 6 java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/Stoppable 7 at java.lang. ClassLoader .defineClass1(Native Method) 8 at java.lang. ClassLoader .defineClass( ClassLoader .java:800) 9 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 10 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) 11 at java.net.URLClassLoader.access$100(URLClassLoader.java:71) 12 at java.net.URLClassLoader$1.run(URLClassLoader.java:361) 13 at java.net.URLClassLoader$1.run(URLClassLoader.java:355) 14 at java.security.AccessController.doPrivileged(Native Method) 15 at java.net.URLClassLoader.findClass(URLClassLoader.java:354) 16 at java.lang. ClassLoader .loadClass( ClassLoader .java:425) 17 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) 18 at java.lang. ClassLoader .loadClass( ClassLoader .java:358) 19 at java.lang. ClassLoader .defineClass1(Native Method) 20 at java.lang. ClassLoader .defineClass( ClassLoader .java:800) 21 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 22 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) 23 at java.net.URLClassLoader.access$100(URLClassLoader.java:71) 24 at java.net.URLClassLoader$1.run(URLClassLoader.java:361) 25 at java.net.URLClassLoader$1.run(URLClassLoader.java:355) 26 at java.security.AccessController.doPrivileged(Native Method) 27 at java.net.URLClassLoader.findClass(URLClassLoader.java:354) 28 at java.lang. ClassLoader .loadClass( ClassLoader .java:425) 29 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) 30 at java.lang. ClassLoader .loadClass( ClassLoader .java:358) 31 at java.lang. ClassLoader .defineClass1(Native Method) 32 at java.lang. ClassLoader .defineClass( ClassLoader .java:800) 33 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 34 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) 35 at java.net.URLClassLoader.access$100(URLClassLoader.java:71) 36 at java.net.URLClassLoader$1.run(URLClassLoader.java:361) 37 at java.net.URLClassLoader$1.run(URLClassLoader.java:355) 38 at java.security.AccessController.doPrivileged(Native Method) 39 at java.net.URLClassLoader.findClass(URLClassLoader.java:354) 40 at java.lang. ClassLoader .loadClass( ClassLoader .java:425) 41 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) 42 at java.lang. ClassLoader .loadClass( ClassLoader .java:358) 43 at org.apache.hadoop.chukwa.datacollection.writer.TestHBaseWriter.setUp(TestHBaseWriter.java:65) 44 at junit.framework.TestCase.runBare(TestCase.java:132) 45 at junit.framework.TestResult$1.protect(TestResult.java:110) 46 at junit.framework.TestResult.runProtected(TestResult.java:128) 47 at junit.framework.TestResult.run(TestResult.java:113) 48 at junit.framework.TestCase.run(TestCase.java:124) 49 at junit.framework.TestSuite.runTest(TestSuite.java:243) 50 at junit.framework.TestSuite.run(TestSuite.java:238) 51 at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) 52 at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) 53 at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) 54 at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104) 55 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Does Gora support Hadoop1? Yes we have shim layer support for Hadoop 1.2.1 and 2.5.2. We just need to configure it properly. We probably need to setup another profile for enabling Hadoop 1 vs Hadoop 2. Most likely Eric. I will be looking in to this. I see that there are existing profiles for 0.96.2-hadoop1 and 0.94.9 I am therefore thinking that we could potentially set up profiles for Gora-supported backends for HBase 0.98.8-hadoop2, Cassandra 2.0.2, Solr 4.10.3, Mongodb 2.6 and Accumulo 1.5.1. Before I do this right now however I need clarification on what version of HBase is currently activated by default in Gora? how and where is this defined as default within pom.xml? Can Gora map sequence ID value to column name in HBase? Mmmmmm.... I am not sure about this. Reasoning is as follows: currently we define AHEAD OF MAPPING a table name columns within this table (columns can also have optional params like compression, bloom filters, etc ) we then map out data beans to these definitions It would appear to me that for us to be able to map sequenceID to a column name, we would 1) want to dynamically create many many columns over time directly dependent on the number of data chunks we get, is this correct? 2) once we get a new incoming data chunk we would wish to dynamically generate a new column within the existing table with the sequenceID as the column name, is this correct? Thanks
        Hide
        lewismc Lewis John McGibbney added a comment -

        Patch v2 which adds in mapping files for other Gora backends as well as the gora.properties files. There are still issues here with regards to tests, the remaining issues currently in discussion with Eric Yang as well as failing tests.
        Right now I can't even get Chukwa testing fully on my laptop.

        Show
        lewismc Lewis John McGibbney added a comment - Patch v2 which adds in mapping files for other Gora backends as well as the gora.properties files. There are still issues here with regards to tests, the remaining issues currently in discussion with Eric Yang as well as failing tests. Right now I can't even get Chukwa testing fully on my laptop.
        Hide
        eyang Eric Yang added a comment -

        We probably want to support HBase 0.98.8 or newer to be current with open source. The class not found exception indicates that HBase jar file is not on the class path for testing. 1) Yes, use column for entry. This provides a cursor of where data is fetched, if we need to scroll between entries. 2) Yes, this is correct, we dynamically generate new column and use sequence id as column name.

        Does Chukwa unit test runs on your laptop without the patch?

        Show
        eyang Eric Yang added a comment - We probably want to support HBase 0.98.8 or newer to be current with open source. The class not found exception indicates that HBase jar file is not on the class path for testing. 1) Yes, use column for entry. This provides a cursor of where data is fetched, if we need to scroll between entries. 2) Yes, this is correct, we dynamically generate new column and use sequence id as column name. Does Chukwa unit test runs on your laptop without the patch?
        Hide
        lewismc Lewis John McGibbney added a comment -

        Hi Eric Yang, regarding 2 I've taken this over to the Gora dev@ list to discuss it over there as this is not something I've needed to date.
        I'll update here with status of tests without patch.
        Thanks
        Lewis

        Show
        lewismc Lewis John McGibbney added a comment - Hi Eric Yang , regarding 2 I've taken this over to the Gora dev@ list to discuss it over there as this is not something I've needed to date. I'll update here with status of tests without patch. Thanks Lewis
        Hide
        eyang Eric Yang added a comment -

        Any news from Gora dev? I recently updated Chukwa to work with Hadoop 2.6.0 and HBase 1.0.0. Do you mind to rebase the patch with those versions? Thanks

        Show
        eyang Eric Yang added a comment - Any news from Gora dev? I recently updated Chukwa to work with Hadoop 2.6.0 and HBase 1.0.0. Do you mind to rebase the patch with those versions? Thanks
        Hide
        lewismc Lewis John McGibbney added a comment -

        Hi Eric,
        Yes I am fixing a major bug in Gora-cassandra ... Which is a kind of
        blocker. It's pretty major.
        Nested, persisted Union records e.g. Super column nester Union records are
        not being persisted properly.
        HBase is stable.
        I would like to say.... Push Chukwa release. Once we fix and release Gora
        0.6.1 we can progress.
        This is an active, and beneficial issue.
        Thanks
        Lewis


        Lewis

        Show
        lewismc Lewis John McGibbney added a comment - Hi Eric, Yes I am fixing a major bug in Gora-cassandra ... Which is a kind of blocker. It's pretty major. Nested, persisted Union records e.g. Super column nester Union records are not being persisted properly. HBase is stable. I would like to say.... Push Chukwa release. Once we fix and release Gora 0.6.1 we can progress. This is an active, and beneficial issue. Thanks Lewis – Lewis
        Hide
        eyang Eric Yang added a comment - - edited

        Thanks Lewis,

        A few suggestions,

        1. Need to include hbase-client for test case to pass:

                         <dependency>
                             <groupId>org.apache.hbase</groupId>
                            <artifactId>hbase-client</artifactId>
                            <version>${hbase.version}</version>
                        </dependency>
        

        2. gora.properties is better hosted in conf directory instead of src/main/resources. This allow user to configure it during deployment time instead of hardcode into jar file.

        3. We may want to generate two gora.properties, one for test case, and one for release. The one with test case can run with in memory database to reduce test running time. The production one is preconfigured with hbase to make it easier for new comer to adopt this solution.

        4. We probably want to have a developer guide for GoraWriter. It is really powerful stuff to enrich Chukwa's capability to write to different storage system. Tutorial could help new developers.

        5. I encountered a issue when I configure gora.properties to write to HBase from chukwa agent. I get this error:

        2015-03-14 14:12:59.451 java[11075:636025] Unable to load realm info from SCDynamicStore
        Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
        	at org.apache.gora.hbase.store.HBaseMapping$HBaseMappingBuilder.build(HBaseMapping.java:174)
        	at org.apache.gora.hbase.store.HBaseStore.readMapping(HBaseStore.java:811)
        	at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:116)
        	at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:101)
        	at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:160)
        	at org.apache.gora.store.DataStoreFactory.getDataStore(DataStoreFactory.java:277)
        	at org.apache.hadoop.chukwa.datacollection.writer.gora.GoraWriter.init(GoraWriter.java:67)
        	at org.apache.hadoop.chukwa.datacollection.writer.gora.GoraWriter.<init>(GoraWriter.java:53)
        	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        	at java.lang.Class.newInstance(Class.java:374)
        	at org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter.init(PipelineStageWriter.java:100)
        	at org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter.<init>(PipelineStageWriter.java:48)
        	at org.apache.hadoop.chukwa.datacollection.connector.PipelineConnector.start(PipelineConnector.java:87)
        	at org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent.main(ChukwaAgent.java:292)
        

        This is what I added to gora.properties:

        gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
        gora.datastore.autocreateschema=true
        

        I am not sure if the last error was caused by the default Chukwa agent attempting to write to system metrics into HBase using gora. This bring an interesting question on how we want to configure data type map to writers.

        Show
        eyang Eric Yang added a comment - - edited Thanks Lewis, A few suggestions, 1. Need to include hbase-client for test case to pass: <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>${hbase.version}</version> </dependency> 2. gora.properties is better hosted in conf directory instead of src/main/resources. This allow user to configure it during deployment time instead of hardcode into jar file. 3. We may want to generate two gora.properties, one for test case, and one for release. The one with test case can run with in memory database to reduce test running time. The production one is preconfigured with hbase to make it easier for new comer to adopt this solution. 4. We probably want to have a developer guide for GoraWriter. It is really powerful stuff to enrich Chukwa's capability to write to different storage system. Tutorial could help new developers. 5. I encountered a issue when I configure gora.properties to write to HBase from chukwa agent. I get this error: 2015-03-14 14:12:59.451 java[11075:636025] Unable to load realm info from SCDynamicStore Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V at org.apache.gora.hbase.store.HBaseMapping$HBaseMappingBuilder.build(HBaseMapping.java:174) at org.apache.gora.hbase.store.HBaseStore.readMapping(HBaseStore.java:811) at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:116) at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:101) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:160) at org.apache.gora.store.DataStoreFactory.getDataStore(DataStoreFactory.java:277) at org.apache.hadoop.chukwa.datacollection.writer.gora.GoraWriter.init(GoraWriter.java:67) at org.apache.hadoop.chukwa.datacollection.writer.gora.GoraWriter.<init>(GoraWriter.java:53) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at java.lang. Class .newInstance( Class .java:374) at org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter.init(PipelineStageWriter.java:100) at org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter.<init>(PipelineStageWriter.java:48) at org.apache.hadoop.chukwa.datacollection.connector.PipelineConnector.start(PipelineConnector.java:87) at org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent.main(ChukwaAgent.java:292) This is what I added to gora.properties: gora.datastore. default =org.apache.gora.hbase.store.HBaseStore gora.datastore.autocreateschema= true I am not sure if the last error was caused by the default Chukwa agent attempting to write to system metrics into HBase using gora. This bring an interesting question on how we want to configure data type map to writers.
        Hide
        eyang Eric Yang added a comment -

        Rebase patch to current trunk.

        Show
        eyang Eric Yang added a comment - Rebase patch to current trunk.
        Hide
        eyang Eric Yang added a comment -

        Lewis, do you want to rebase it for Gora 0.6.1 or newer? I am fine to commit this as it is for 0.7.0 release.

        Show
        eyang Eric Yang added a comment - Lewis, do you want to rebase it for Gora 0.6.1 or newer? I am fine to commit this as it is for 0.7.0 release.
        Hide
        lewismc Lewis John McGibbney added a comment -

        Hi Eric,
        I am happy to commit the code however we are not addressing the inherent
        lack of functionality in Gora which is dynamic data structures. This is
        discussed and is logged as a Gora ticket.
        I am in the process of upgrading Gora to deal with HBase 1.1.4 so I think
        we should leave this for the time being and use it as a feature in Chukwa
        trunk. For example you can persist Chukwa log data into cassandra. It is
        powerful but it is not ready yet!

        On Saturday, December 12, 2015, Eric Yang (JIRA) <jira@apache.org


        Lewis

        Show
        lewismc Lewis John McGibbney added a comment - Hi Eric, I am happy to commit the code however we are not addressing the inherent lack of functionality in Gora which is dynamic data structures. This is discussed and is logged as a Gora ticket. I am in the process of upgrading Gora to deal with HBase 1.1.4 so I think we should leave this for the time being and use it as a feature in Chukwa trunk. For example you can persist Chukwa log data into cassandra. It is powerful but it is not ready yet! On Saturday, December 12, 2015, Eric Yang (JIRA) <jira@apache.org – Lewis
        Hide
        eyang Eric Yang added a comment -

        The current code doesn't break Chukwa. Therefore, I will commit it to avoid having to rebase again. This JIRA will be left open, and unset fix version. Sounds good?

        Show
        eyang Eric Yang added a comment - The current code doesn't break Chukwa. Therefore, I will commit it to avoid having to rebase again. This JIRA will be left open, and unset fix version. Sounds good?
        Hide
        lewismc Lewis John McGibbney added a comment -

        Excellent Eric


        Lewis

        Show
        lewismc Lewis John McGibbney added a comment - Excellent Eric – Lewis
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Chukwa-master #546 (See https://builds.apache.org/job/Chukwa-master/546/)
        CHUKWA-734. Added GoraWriter. (Lewis John McGibbney via Eric Yang) (eyang: rev cac335a10c8f9b0ce09cc4dc5bead940e07f78bd)

        • src/main/java/org/apache/hadoop/chukwa/datacollection/writer/gora/GoraWriter.java
        • src/main/java/org/apache/hadoop/chukwa/datacollection/writer/gora/ChukwaChunk.java
        • src/main/resources/gora-hbase-mapping.xml
        • src/main/resources/gora-accumulo-mapping.xml
        • src/main/resources/chukwachunk.json
        • src/main/resources/gora-solr-mapping.xml
        • CHANGES.txt
        • src/main/resources/gora.properties
        • pom.xml
        • src/main/resources/gora-cassandra-mapping.xml
        • src/main/resources/gora-mongodb-mapping.xml
        • src/main/java/org/apache/hadoop/chukwa/datacollection/writer/gora/package-info.java
          CHUKWA-734. Added GoraWriter. (Lewis John McGibbney via Eric Yang) (eyang: rev b9fa3b25ac5d81d4b850f7029a92f370e995f945)
        • conf/chukwa-agent-conf.xml
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Chukwa-master #546 (See https://builds.apache.org/job/Chukwa-master/546/ ) CHUKWA-734 . Added GoraWriter. (Lewis John McGibbney via Eric Yang) (eyang: rev cac335a10c8f9b0ce09cc4dc5bead940e07f78bd) src/main/java/org/apache/hadoop/chukwa/datacollection/writer/gora/GoraWriter.java src/main/java/org/apache/hadoop/chukwa/datacollection/writer/gora/ChukwaChunk.java src/main/resources/gora-hbase-mapping.xml src/main/resources/gora-accumulo-mapping.xml src/main/resources/chukwachunk.json src/main/resources/gora-solr-mapping.xml CHANGES.txt src/main/resources/gora.properties pom.xml src/main/resources/gora-cassandra-mapping.xml src/main/resources/gora-mongodb-mapping.xml src/main/java/org/apache/hadoop/chukwa/datacollection/writer/gora/package-info.java CHUKWA-734 . Added GoraWriter. (Lewis John McGibbney via Eric Yang) (eyang: rev b9fa3b25ac5d81d4b850f7029a92f370e995f945) conf/chukwa-agent-conf.xml

          People

          • Assignee:
            lewismc Lewis John McGibbney
            Reporter:
            lewismc Lewis John McGibbney
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 5h
              5h
              Remaining:
              Remaining Estimate - 5h
              5h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development