Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2463

Add support for Hive and HBase datasets to DatasetSink

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.0
    • Component/s: None
    • Labels:
      None

      Description

      The current configuration only allows the DatasetSink to write to HDFS, but datasets commonly have hbase or hive URIs. By adding the kite-data-hive and kite-data-hbase dependencies, the DatasetSink can write to those stores.

      1. FLUME-2463-1.patch
        3 kB
        Ryan Blue
      2. FLUME-2463-2.patch
        3 kB
        Ryan Blue
      3. FLUME-2463-3.patch
        3 kB
        Ryan Blue
      4. FLUME-2463-4.patch
        3 kB
        Ryan Blue
      5. FLUME-2463-5.patch
        3 kB
        Ryan Blue
      6. FLUME-2463-6.patch
        3 kB
        Ryan Blue

        Activity

        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Flume-trunk-hbase-98 #23 (See https://builds.apache.org/job/Flume-trunk-hbase-98/23/)
        FLUME-2463. Add Hive and HBase dataset support in DatasetSink. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=6d0243112fa1ff1cb796ebe158630ae681a2afc8)

        • flume-ng-sinks/flume-dataset-sink/pom.xml
        • pom.xml
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Flume-trunk-hbase-98 #23 (See https://builds.apache.org/job/Flume-trunk-hbase-98/23/ ) FLUME-2463 . Add Hive and HBase dataset support in DatasetSink. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=6d0243112fa1ff1cb796ebe158630ae681a2afc8 ) flume-ng-sinks/flume-dataset-sink/pom.xml pom.xml
        Hide
        hudson Hudson added a comment -

        UNSTABLE: Integrated in flume-trunk #663 (See https://builds.apache.org/job/flume-trunk/663/)
        FLUME-2463. Add Hive and HBase dataset support in DatasetSink. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=6d0243112fa1ff1cb796ebe158630ae681a2afc8)

        • flume-ng-sinks/flume-dataset-sink/pom.xml
        • pom.xml
        Show
        hudson Hudson added a comment - UNSTABLE: Integrated in flume-trunk #663 (See https://builds.apache.org/job/flume-trunk/663/ ) FLUME-2463 . Add Hive and HBase dataset support in DatasetSink. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=6d0243112fa1ff1cb796ebe158630ae681a2afc8 ) flume-ng-sinks/flume-dataset-sink/pom.xml pom.xml
        Hide
        hshreedharan Hari Shreedharan added a comment -

        Committed! Thanks Ryan!

        Show
        hshreedharan Hari Shreedharan added a comment - Committed! Thanks Ryan!
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 89e0e53ee60840fc0a50bf62085b83384b754f9f in flume's branch refs/heads/flume-1.6 from Hari Shreedharan
        [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=89e0e53 ]

        FLUME-2463. Add Hive and HBase dataset support in DatasetSink.

        (Ryan Blue via Hari)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 89e0e53ee60840fc0a50bf62085b83384b754f9f in flume's branch refs/heads/flume-1.6 from Hari Shreedharan [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=89e0e53 ] FLUME-2463 . Add Hive and HBase dataset support in DatasetSink. (Ryan Blue via Hari)
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 6d0243112fa1ff1cb796ebe158630ae681a2afc8 in flume's branch refs/heads/trunk from Hari Shreedharan
        [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=6d02431 ]

        FLUME-2463. Add Hive and HBase dataset support in DatasetSink.

        (Ryan Blue via Hari)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 6d0243112fa1ff1cb796ebe158630ae681a2afc8 in flume's branch refs/heads/trunk from Hari Shreedharan [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=6d02431 ] FLUME-2463 . Add Hive and HBase dataset support in DatasetSink. (Ryan Blue via Hari)
        Hide
        hshreedharan Hari Shreedharan added a comment -

        +1. Committing

        Show
        hshreedharan Hari Shreedharan added a comment - +1. Committing
        Hide
        rdblue Ryan Blue added a comment -

        This fixes the test problems (passing locally). This was caused by Hive shading its own version of Avro.

        Show
        rdblue Ryan Blue added a comment - This fixes the test problems (passing locally). This was caused by Hive shading its own version of Avro.
        Hide
        hshreedharan Hari Shreedharan added a comment -

        Tests are failing with a bunch of NoSuchMethodErrors:

        Running org.apache.flume.sink.kite.TestDatasetSink
        Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.356 sec <<< FAILURE!
        org.apache.flume.sink.kite.TestDatasetSink  Time elapsed: 355 sec  <<< ERROR!
        java.lang.NoSuchMethodError: org.apache.avro.Schema.getJsonProp(Ljava/lang/String;)Lorg/codehaus/jackson/JsonNode;
        	at org.kitesdk.data.spi.PartitionStrategyParser.hasEmbeddedStrategy(PartitionStrategyParser.java:105)
        	at org.kitesdk.data.DatasetDescriptor$Builder.build(DatasetDescriptor.java:866)
        	at org.apache.flume.sink.kite.TestDatasetSink.<clinit>(TestDatasetSink.java:95)
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        	at java.lang.reflect.Method.invoke(Method.java:606)
        	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
        	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
        	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
        	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
        	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
        	at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
        	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
        	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
        	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        	at java.lang.reflect.Method.invoke(Method.java:606)
        	at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
        	at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
        	at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
        	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
        	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
        
        org.apache.flume.sink.kite.TestDatasetSink  Time elapsed: 355 sec  <<< ERROR!
        java.lang.NoClassDefFoundError: Could not initialize class org.apache.flume.sink.kite.TestDatasetSink
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        	at java.lang.reflect.Method.invoke(Method.java:606)
        	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
        	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
        	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
        	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:36)
        	at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
        	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
        	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
        	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        	at java.lang.reflect.Method.invoke(Method.java:606)
        	at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
        	at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
        	at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
        	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
        	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
        
        
        Results :
        
        Tests in error: 
          org.apache.flume.sink.kite.TestDatasetSink: org.apache.avro.Schema.getJsonProp(Ljava/lang/String;)Lorg/codehaus/jackson/JsonNode;
          org.apache.flume.sink.kite.TestDatasetSink: Could not initialize class org.apache.flume.sink.kite.TestDatasetSink
        
        
        Show
        hshreedharan Hari Shreedharan added a comment - Tests are failing with a bunch of NoSuchMethodErrors: Running org.apache.flume.sink.kite.TestDatasetSink Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.356 sec <<< FAILURE! org.apache.flume.sink.kite.TestDatasetSink Time elapsed: 355 sec <<< ERROR! java.lang.NoSuchMethodError: org.apache.avro.Schema.getJsonProp(Ljava/lang/ String ;)Lorg/codehaus/jackson/JsonNode; at org.kitesdk.data.spi.PartitionStrategyParser.hasEmbeddedStrategy(PartitionStrategyParser.java:105) at org.kitesdk.data.DatasetDescriptor$Builder.build(DatasetDescriptor.java:866) at org.apache.flume.sink.kite.TestDatasetSink.<clinit>(TestDatasetSink.java:95) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) org.apache.flume.sink.kite.TestDatasetSink Time elapsed: 355 sec <<< ERROR! java.lang.NoClassDefFoundError: Could not initialize class org.apache.flume.sink.kite.TestDatasetSink at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:36) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) Results : Tests in error: org.apache.flume.sink.kite.TestDatasetSink: org.apache.avro.Schema.getJsonProp(Ljava/lang/ String ;)Lorg/codehaus/jackson/JsonNode; org.apache.flume.sink.kite.TestDatasetSink: Could not initialize class org.apache.flume.sink.kite.TestDatasetSink
        Hide
        rdblue Ryan Blue added a comment -

        This patch uses the correct artifact id, kite-data-hcatalog. That's being migrated to kite-data-hive upstream.

        Show
        rdblue Ryan Blue added a comment - This patch uses the correct artifact id, kite-data-hcatalog. That's being migrated to kite-data-hive upstream.
        Hide
        hshreedharan Hari Shreedharan added a comment -

        Build fails with:

        [ERROR] Failed to execute goal on project flume-dataset-sink: Could not resolve dependencies for project org.apache.flume.flume-ng-sinks:flume-dataset-sink:jar:1.6.0-SNAPSHOT: Could not find artifact org.kitesdk:kite-data-hive:jar:0.15.0 in cdh.repo (https://repository.cloudera.com/artifactory/cloudera-repos) -> [Help 1]
        [ERROR] 
        [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
        [ERROR] Re-run Maven using the -X switch to enable full debug logging.
        [ERROR] 
        [ERROR] For more information about the errors and possible solutions, please read the following articles:
        [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
        [ERROR] 
        [ERROR] After correcting the problems, you can resume the build with the command
        [ERROR]   mvn <goals> -rf :flume-dataset-sink
        

        The artifacts are not in the repo?

        Show
        hshreedharan Hari Shreedharan added a comment - Build fails with: [ERROR] Failed to execute goal on project flume-dataset-sink: Could not resolve dependencies for project org.apache.flume.flume-ng-sinks:flume-dataset-sink:jar:1.6.0-SNAPSHOT: Could not find artifact org.kitesdk:kite-data-hive:jar:0.15.0 in cdh.repo (https: //repository.cloudera.com/artifactory/cloudera-repos) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch . [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http: //cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :flume-dataset-sink The artifacts are not in the repo?
        Hide
        rdblue Ryan Blue added a comment -

        This one should work. There was a copy/paste error. Sorry about that!

        Show
        rdblue Ryan Blue added a comment - This one should work. There was a copy/paste error. Sorry about that!
        Hide
        rdblue Ryan Blue added a comment -

        Here's a new patch rebased on the current trunk. Oddly enough, the rebase went smoothly and the patch looks correct.

        Show
        rdblue Ryan Blue added a comment - Here's a new patch rebased on the current trunk. Oddly enough, the rebase went smoothly and the patch looks correct.
        Hide
        hshreedharan Hari Shreedharan added a comment -

        Build fails with:

        [ERROR] The build could not read 1 project -> [Help 1]
        [ERROR]   
        [ERROR]   The project  (/Users/hshreedharan/work/flume-latest/flume/pom.xml) has 1 error
        [ERROR]     Non-parseable POM /Users/hshreedharan/work/flume-latest/flume/pom.xml: end tag name </kite.version> must be the same as start tag <hive.version> from line 55 (position: TEXT seen ...<hive.version>0.10.0</kite.version>... @55:40)  @ line 55, column 40 -> [Help 2]
        [ERROR] 
        [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
        [ERROR] Re-run Maven using the -X switch to enable full debug logging.
        [ERROR] 
        [ERROR] For more information about the errors and possible solutions, please read the following articles:
        [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
        [ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/ModelParseException
        

        I think the patch needs to be rebased.

        Show
        hshreedharan Hari Shreedharan added a comment - Build fails with: [ERROR] The build could not read 1 project -> [Help 1] [ERROR] [ERROR] The project (/Users/hshreedharan/work/flume-latest/flume/pom.xml) has 1 error [ERROR] Non-parseable POM /Users/hshreedharan/work/flume-latest/flume/pom.xml: end tag name </kite.version> must be the same as start tag <hive.version> from line 55 (position: TEXT seen ...<hive.version>0.10.0</kite.version>... @55:40) @ line 55, column 40 -> [Help 2] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch . [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http: //cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException [ERROR] [Help 2] http: //cwiki.apache.org/confluence/display/MAVEN/ModelParseException I think the patch needs to be rebased.
        Hide
        fwiffo Joey Echeverria added a comment -

        +1 (non-binding)

        Show
        fwiffo Joey Echeverria added a comment - +1 (non-binding)
        Hide
        rdblue Ryan Blue added a comment -

        Updated patch that sets up dependency versions correctly. Thanks to Joey for pointing out the error!

        Show
        rdblue Ryan Blue added a comment - Updated patch that sets up dependency versions correctly. Thanks to Joey for pointing out the error!
        Hide
        rdblue Ryan Blue added a comment -

        This patch updates the POM files to add Kite's hive and hbase support, as well as runtime dependencies.

        Show
        rdblue Ryan Blue added a comment - This patch updates the POM files to add Kite's hive and hbase support, as well as runtime dependencies.

          People

          • Assignee:
            rdblue Ryan Blue
            Reporter:
            rdblue Ryan Blue
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development