Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
Apache Gobblin 170724, Apache Gobblin 170807, Apache Gobblin 170821, Apache Gobblin 170905
Description
https://github.com/linkedin/gobblin/blob/master/gobblin-docs/case-studies/Kafka-HDFS-Ingestion.md
There is:
source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
extract.namespace=gobblin.extract.kafka
and we get:
Exception in thread main java.lang.ClassNotFoundException: gobblin.source.extractor.extract.kafka.KafkaSimpleSource
as it was moved in commit 130afec71: Moving Kafka dependencies out into version specific modules (#1417)
and it seems that module gobblin-modules where it was moved does not build at all
Github Url : https://github.com/linkedin/gobblin/issues/1483
Github Reporter : wosiu
Github Assignee : shirshanka
Github Created At : 2016-12-22T00:11:10Z
Github Updated At : 2017-02-14T23:36:32Z
Comments
chavdar wrote on 2016-12-22T04:38:29Z : Hi Michal,
Are you running gobblin-example, gobblin-distribution or a different
packaging?
Thanks.
On Wed, Dec 21, 2016 at 4:11 PM, Michał Woś <notifications@github.com>
wrote:
> https://github.com/linkedin/gobblin/blob/master/gobblin-
> docs/case-studies/Kafka-HDFS-Ingestion.md
>
> There is:
> source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
> extract.namespace=gobblin.extract.kafka
>
> and we gets:
> Exception in thread main java.lang.ClassNotFoundException:
> gobblin.source.extractor.extract.kafka.KafkaSimpleSource
>
> as it was moved in commit: Moving Kafka dependencies out into version
> specific modules (#1417 <https://github.com/linkedin/gobblin/pull/1417>)
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/linkedin/gobblin/issues/1483>, or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AA4sG9JoGxS3egli3sOlbE-QyPlMlL6Kks5rKcAfgaJpZM4LTgpY>
> .
>
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-268716836
panagiotious wrote on 2016-12-24T23:18:30Z : I think I am facing the same problem:
```
$ bin/gobblin-mapreduce.sh --workdir gobblin-dirs/work --conf ~/gobblin/config/kafka1.pull
[...Hadoop gossip...]
Exception in thread main java.lang.ClassNotFoundException: gobblin.source.extractor.extract.kafka.KafkaSimpleSource
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:195)
at gobblin.runtime.JobContext.createSource(JobContext.java:216)
at gobblin.runtime.JobContext.<init>(JobContext.java:148)
at gobblin.runtime.AbstractJobLauncher.<init>(AbstractJobLauncher.java:140)
at gobblin.runtime.mapreduce.MRJobLauncher.<init>(MRJobLauncher.java:130)
at gobblin.runtime.mapreduce.CliMRJobLauncher.<init>(CliMRJobLauncher.java:54)
at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
```
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269103855
shirshanka wrote on 2016-12-25T00:53:21Z : @panagiotious : How are you pulling in the gobblin jars?
You need gobblin-kafka-08 jar on your classpath.
gobblin-core is supposed to pull this in transitively.
https://mvnrepository.com/artifact/com.linkedin.gobblin/gobblin-kafka-08/0.9.0
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269106010
panagiotious wrote on 2016-12-25T03:32:17Z : After extracting the tarball create from the `./gradlew clean assemble` command (`build` fails on the `metastore` tests for some weird reason that I still cannot debug - but there is no dependency referenced in the documentation), I can see `gobblin-dist/lib/gobblin-kafka-08-0.9.0-24-gec7d3a2.jar` and `gobblin-dist/lib/gobblin-kafka-common-0.9.0-24-gec7d3a2.jar` in the `lib/` directory. I assume that directory is added in the `classpath` when I submit the job, or it would not be able to find any `jar`.
Weirdly enough, if I copy `lib/gobblin-core-0.8.0.jar` that contains the class `KafkaSimpleSource` from the distribution of v0.8.0 (the downloaded tarball file) to my `lib` directory, the aforementioned error (missing class) does not appear, but the job hangs and is not submitted to my cluster.
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269108794
shirshanka wrote on 2016-12-25T07:31:30Z : Is Kafka ingestion working for you in standalone mode? (http://gobblin.readthedocs.io/en/latest/case-studies/Kafka-HDFS-Ingestion/#standalone)
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269113024
panagiotious wrote on 2016-12-25T07:50:25Z : Yes, perfectly! I have tried it both with the Wikipedia example and our Kafka topics.
I just realized that the `assemble` command has been giving me an incomplete distribution. I have been using Java 7, which is deployed by the Cloudera Manager to our edge nodes, which turns out is not supported by Gobblin 0.8.0; it makes the `metastore` tests fail and (probably) some jars are silently failing to be built and included in the final `tarball`.
I have successfully assembled Gobblin with Java 8 on a different host, but it will not run on our edge nodes. The closest I got with copying jars from version 0.8.0 to the manually assembled version 0.9.0 was getting the job submitted and then it failing with `Error: java.lang.ClassNotFoundException: gobblin.metrics.MetricContext at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at [...]`.
At this point I should note that Java 7 fails to complete a `build` at the `metastore` tests, whereas Java 8 seems to get past that but fail when it makes extraordinary assumptions, like ZooKeeper running on the host - the host has `iptables` that would block any communication that is not expected.
I do not understand why `assemble` would yield a different distribution than `build`, but I am pretty confident that that's the case when tests are failing. The command I've been using has been `./gradlew clean assemble -PuseHadoop2 -PhadoopVersion=2.6.0-cdh5.8.3 --stacktrace --info` in all my builds. My config file is the MR example with Kafka (with changes in the hosts of course).
I will keep trying, although if there is no Java 7 compatibility, I am not confident I can get this up and running with CDH on our cluster.
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269113431
wosiu wrote on 2016-12-25T12:43:55Z : Fixed for me. #1490 works
FYI, the way I build:
```
./gradlew -PhadoopVersion=2.6.0-cdh5.4.3 -q clean assemble
```
and run:
```
LPATH=/home/michalw/gobblin-dist/lib
BUILD=0.9.0-27-gf497089
./gobblin-dist/bin/gobblin-mapreduce.sh \
--jt yarnrm \
--fs hdfs://logs-hdfs-nameserivce \
--conf gobblin-job-config/gobblin-mr-ingestion.properties \
--logdir gobblin-logs \
--workdir /home/michalw/gobblin-work \
-jars $LPATH/guava-retrying-2.0.0.jar,$LPATH/kafka-avro-serializer-2.0.1.jar,$LPATH/kafka-json-serializer-2.0.1.jar,$LPATH/hadoop-common-2.6.0-cdh5.4.3.jar,$LPATH/gobblin-metrics-base$BUILD.jar,$LPATH/gobblin-metrics-$BUILD.jar,$LPATH/gobblin-core-base-$BUILD.jar,$LPATH/gobblin-kafka-08-$BUILD.jar,$LPATH/gobblin-kafka-common-$BUILD.jar
```
unfortunately I need to specify all that jars by hand.. But it works
@panagiotious let me know if I may close this one.
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269121168
panagiotious wrote on 2016-12-25T23:52:26Z : Yes this looks like it works. I assumed that the `lib/` directory would have already been added in the jars dependencies, but I guess we need to explicitly specify.
Thank you!
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269142272
shirshanka wrote on 2016-12-28T08:04:11Z : Looks like gobblin-mapreduce.sh is selectively pulling in specific jars for its runtime deps from /lib.
https://github.com/linkedin/gobblin/blob/master/bin/gobblin-mapreduce.sh#L130
That's what broke your jobs. We'll figure out the best maintainable solution long term and update the script. Let's keep this issue open.
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269441365
wosiu wrote on 2016-12-28T09:02:32Z : Ok, so just to refer - I created some time ago:
https://github.com/linkedin/gobblin/issues/1466
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269447676
wosiu wrote on 2017-01-05T03:53:17Z : Meantime: Is it possible to build gobblin in a way that all jars are without version infix?
I mean instead e.g.:
gobblin-yarn-0.9.0-28-gcb609f2.jar
we would have:
gobblin-yarn.jar ?
I'm aware I can change it after build, but maybe there is already some knob in your gradle config?
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-270557996
anshuGithubData wrote on 2017-02-01T13:05:23Z : Hello everyone,
I was also facing the similar issues.
As Woisu mentioned, I have tried to follow the steps as below.
(1)
When I do assemble with following command **./gradlew -PhadoopVersion=2.6.0-cdh5.9.0 -q clean assemble **, it fails with following error
Task failed with an exception.
-----------
- What went wrong:
Execution failed for task ':gobblin-api:javadoc'.
> Javadoc generation failed. Generated Javadoc options file (useful for troubleshooting): '/home/cdhuser/gobblin/build/gobblin-api/tmp/javadoc/javadoc.options'
- Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.
==============================================================================
Task failed with an exception.
-----------
- What went wrong:
Execution failed for task ':gobblin-rest-service:gobblin-rest-client:javadoc'.
> Javadoc generation failed. Generated Javadoc options file (useful for troubleshooting): '/home/cdhuser/gobblin/build/gobblin-rest-client/tmp/javadoc/javadoc.options'
(2)
So I tried this one *./gradlew -x javadoc -PhadoopVersion=2.6.0-cdh5.9.0 -q clean assemble --stacktrace* and it was okay I believe, I get the below message.
2 warnings
Note: /home/cdhuser/gobblin/gobblin-modules/gobblin-couchbase/src/main/java/gobblin/couchbase/writer/CouchbaseWriter.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: /home/cdhuser/gobblin/gobblin-modules/gobblin-couchbase/src/main/java/gobblin/couchbase/writer/CouchbaseWriter.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
/home/cdhuser/gobblin/gobblin-runtime/src/main/java/gobblin/runtime/mapreduce/GobblinWorkUnitsInputFormat.java:124: warning: Generating equals/hashCode implementation but without a call to superclass, even though this class does not extend java.lang.Object. If this is intentional, add '@EqualsAndHashCode(callSuper=false)' to your type.
@EqualsAndHashCode
^
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: /home/cdhuser/gobblin/gobblin-compaction/src/main/java/gobblin/compaction/mapreduce/avro/AvroKeyDedupReducer.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 warning
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: /home/cdhuser/gobblin/gobblin-cluster/src/main/java/gobblin/cluster/GobblinHelixTaskStateTracker.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: /home/cdhuser/gobblin/gobblin-cluster/src/main/java/gobblin/cluster/GobblinHelixJob.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Note: /home/cdhuser/gobblin/gobblin-aws/src/main/java/gobblin/aws/GobblinAWSClusterLauncher.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: /home/cdhuser/gobblin/gobblin-modules/gobblin-helix/src/main/java/gobblin/runtime/ZkDatasetStateStore.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Note: /home/cdhuser/gobblin/gobblin-modules/gobblin-azkaban/src/main/java/gobblin/azkaban/AzkabanIntegrationTestLauncher.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
(3)
Then I am executing the following command
LPATH=/home/cdhuser/gobblin/build/gobblin-distribution/distributions/gobblin-dist/lib
BUILD=0.9.0-120-g75ebc38
./bin/gobblin-mapreduce.sh \
--jt http://(IP where RM is running):8032 \
--conf confGobblinKafkaTestJobs/kafkatohdfs.pull \
-jars $LPATH/guava-retrying-2.0.0.jar,$LPATH/kafka-avro-serializer-2.0.1.jar,$LPATH/kafka-json-serializer-2.0.1.jar,$LPATH/hadoop-common-2.6.0-cdh5.9.0.jar,$LPATH/gobblin-metrics-base$BUILD.jar,$LPATH/gobblin-metrics-$BUILD.jar,$LPATH/gobblin-core-base-$BUILD.jar,$LPATH/gobblin-kafka-08-$BUILD.jar,$LPATH/gobblin-kafka-common-$BUILD.jar
It is failing with this error *Error: java.lang.ClassNotFoundException: org.reflections.Reflections*.
So I have added $LPATH/javassist-3.18.2-GA.jar, but same error.
Any help would be really appreciated!
Thanks,
Anshu
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-276652455
anshuGithubData wrote on 2017-02-01T13:44:23Z : @wosiu @panagiotious @shirshanka If you guys can please have a look into the issue I am facing (described above) and help me out that would be great!
Thanks
Anshu
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-276660177
wosiu wrote on 2017-02-14T23:34:41Z : @anshuGithubData
Yes, after upgrading to the current master (commit: a1b9bf579fb6) I've got the same
`It is failing with this error Error: java.lang.ClassNotFoundException: org.reflections.Reflections.`
Appending following to --jars flag did the thing for me:
$LPATH/reflections-0.9.10.jar,$LPATH/javassist-3.18.2-GA.jar,$LPATH/opencsv-3.8.jar
Also it seems that authors advice to use gobblin.sh instead of gobblin-mapreduce.sh Although I still use gobblin-mapreduce.sh as it still works fine for me.
Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-279871219