Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-67

gobblin-mapreduce.sh pulling in insufficient runtime dependencies in g0.9.0 for Kafka->HDFS ingestion

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      https://github.com/linkedin/gobblin/blob/master/gobblin-docs/case-studies/Kafka-HDFS-Ingestion.md

      There is:
      source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
      extract.namespace=gobblin.extract.kafka

      and we get:
      Exception in thread main java.lang.ClassNotFoundException: gobblin.source.extractor.extract.kafka.KafkaSimpleSource

      as it was moved in commit 130afec71: Moving Kafka dependencies out into version specific modules (#1417)
      and it seems that module gobblin-modules where it was moved does not build at all

      Github Url : https://github.com/linkedin/gobblin/issues/1483
      Github Reporter : wosiu
      Github Assignee : shirshanka
      Github Created At : 2016-12-22T00:11:10Z
      Github Updated At : 2017-02-14T23:36:32Z

      Comments


      chavdar wrote on 2016-12-22T04:38:29Z : Hi Michal,

      Are you running gobblin-example, gobblin-distribution or a different
      packaging?

      Thanks.

      On Wed, Dec 21, 2016 at 4:11 PM, Michał Woś <notifications@github.com>
      wrote:

      > https://github.com/linkedin/gobblin/blob/master/gobblin-
      > docs/case-studies/Kafka-HDFS-Ingestion.md
      >
      > There is:
      > source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
      > extract.namespace=gobblin.extract.kafka
      >
      > and we gets:
      > Exception in thread main java.lang.ClassNotFoundException:
      > gobblin.source.extractor.extract.kafka.KafkaSimpleSource
      >
      > as it was moved in commit: Moving Kafka dependencies out into version
      > specific modules (#1417 <https://github.com/linkedin/gobblin/pull/1417>)
      >
      > —
      > You are receiving this because you are subscribed to this thread.
      > Reply to this email directly, view it on GitHub
      > <https://github.com/linkedin/gobblin/issues/1483>, or mute the thread
      > <https://github.com/notifications/unsubscribe-auth/AA4sG9JoGxS3egli3sOlbE-QyPlMlL6Kks5rKcAfgaJpZM4LTgpY>
      > .
      >

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-268716836


      panagiotious wrote on 2016-12-24T23:18:30Z : I think I am facing the same problem:
      ```
      $ bin/gobblin-mapreduce.sh --workdir gobblin-dirs/work --conf ~/gobblin/config/kafka1.pull
      [...Hadoop gossip...]
      Exception in thread main java.lang.ClassNotFoundException: gobblin.source.extractor.extract.kafka.KafkaSimpleSource
      at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
      at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
      at java.security.AccessController.doPrivileged(Native Method)
      at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
      at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
      at java.lang.Class.forName0(Native Method)
      at java.lang.Class.forName(Class.java:195)
      at gobblin.runtime.JobContext.createSource(JobContext.java:216)
      at gobblin.runtime.JobContext.<init>(JobContext.java:148)
      at gobblin.runtime.AbstractJobLauncher.<init>(AbstractJobLauncher.java:140)
      at gobblin.runtime.mapreduce.MRJobLauncher.<init>(MRJobLauncher.java:130)
      at gobblin.runtime.mapreduce.CliMRJobLauncher.<init>(CliMRJobLauncher.java:54)
      at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:106)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
      ```

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269103855


      shirshanka wrote on 2016-12-25T00:53:21Z : @panagiotious : How are you pulling in the gobblin jars?
      You need gobblin-kafka-08 jar on your classpath.
      gobblin-core is supposed to pull this in transitively.

      https://mvnrepository.com/artifact/com.linkedin.gobblin/gobblin-kafka-08/0.9.0

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269106010


      panagiotious wrote on 2016-12-25T03:32:17Z : After extracting the tarball create from the `./gradlew clean assemble` command (`build` fails on the `metastore` tests for some weird reason that I still cannot debug - but there is no dependency referenced in the documentation), I can see `gobblin-dist/lib/gobblin-kafka-08-0.9.0-24-gec7d3a2.jar` and `gobblin-dist/lib/gobblin-kafka-common-0.9.0-24-gec7d3a2.jar` in the `lib/` directory. I assume that directory is added in the `classpath` when I submit the job, or it would not be able to find any `jar`.

      Weirdly enough, if I copy `lib/gobblin-core-0.8.0.jar` that contains the class `KafkaSimpleSource` from the distribution of v0.8.0 (the downloaded tarball file) to my `lib` directory, the aforementioned error (missing class) does not appear, but the job hangs and is not submitted to my cluster.

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269108794


      shirshanka wrote on 2016-12-25T07:31:30Z : Is Kafka ingestion working for you in standalone mode? (http://gobblin.readthedocs.io/en/latest/case-studies/Kafka-HDFS-Ingestion/#standalone)

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269113024


      panagiotious wrote on 2016-12-25T07:50:25Z : Yes, perfectly! I have tried it both with the Wikipedia example and our Kafka topics.

      I just realized that the `assemble` command has been giving me an incomplete distribution. I have been using Java 7, which is deployed by the Cloudera Manager to our edge nodes, which turns out is not supported by Gobblin 0.8.0; it makes the `metastore` tests fail and (probably) some jars are silently failing to be built and included in the final `tarball`.

      I have successfully assembled Gobblin with Java 8 on a different host, but it will not run on our edge nodes. The closest I got with copying jars from version 0.8.0 to the manually assembled version 0.9.0 was getting the job submitted and then it failing with `Error: java.lang.ClassNotFoundException: gobblin.metrics.MetricContext at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at [...]`.

      At this point I should note that Java 7 fails to complete a `build` at the `metastore` tests, whereas Java 8 seems to get past that but fail when it makes extraordinary assumptions, like ZooKeeper running on the host - the host has `iptables` that would block any communication that is not expected.

      I do not understand why `assemble` would yield a different distribution than `build`, but I am pretty confident that that's the case when tests are failing. The command I've been using has been `./gradlew clean assemble -PuseHadoop2 -PhadoopVersion=2.6.0-cdh5.8.3 --stacktrace --info` in all my builds. My config file is the MR example with Kafka (with changes in the hosts of course).

      I will keep trying, although if there is no Java 7 compatibility, I am not confident I can get this up and running with CDH on our cluster.

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269113431


      wosiu wrote on 2016-12-25T12:43:55Z : Fixed for me. #1490 works
      FYI, the way I build:
      ```
      ./gradlew -PhadoopVersion=2.6.0-cdh5.4.3 -q clean assemble
      ```
      and run:
      ```
      LPATH=/home/michalw/gobblin-dist/lib
      BUILD=0.9.0-27-gf497089
      ./gobblin-dist/bin/gobblin-mapreduce.sh \
      --jt yarnrm \
      --fs hdfs://logs-hdfs-nameserivce \
      --conf gobblin-job-config/gobblin-mr-ingestion.properties \
      --logdir gobblin-logs \
      --workdir /home/michalw/gobblin-work \
      -jars $LPATH/guava-retrying-2.0.0.jar,$LPATH/kafka-avro-serializer-2.0.1.jar,$LPATH/kafka-json-serializer-2.0.1.jar,$LPATH/hadoop-common-2.6.0-cdh5.4.3.jar,$LPATH/gobblin-metrics-base$BUILD.jar,$LPATH/gobblin-metrics-$BUILD.jar,$LPATH/gobblin-core-base-$BUILD.jar,$LPATH/gobblin-kafka-08-$BUILD.jar,$LPATH/gobblin-kafka-common-$BUILD.jar
      ```
      unfortunately I need to specify all that jars by hand.. But it works
      @panagiotious let me know if I may close this one.

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269121168


      panagiotious wrote on 2016-12-25T23:52:26Z : Yes this looks like it works. I assumed that the `lib/` directory would have already been added in the jars dependencies, but I guess we need to explicitly specify.

      Thank you!

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269142272


      shirshanka wrote on 2016-12-28T08:04:11Z : Looks like gobblin-mapreduce.sh is selectively pulling in specific jars for its runtime deps from /lib.
      https://github.com/linkedin/gobblin/blob/master/bin/gobblin-mapreduce.sh#L130

      That's what broke your jobs. We'll figure out the best maintainable solution long term and update the script. Let's keep this issue open.

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269441365


      wosiu wrote on 2016-12-28T09:02:32Z : Ok, so just to refer - I created some time ago:
      https://github.com/linkedin/gobblin/issues/1466

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-269447676


      wosiu wrote on 2017-01-05T03:53:17Z : Meantime: Is it possible to build gobblin in a way that all jars are without version infix?
      I mean instead e.g.:
      gobblin-yarn-0.9.0-28-gcb609f2.jar
      we would have:
      gobblin-yarn.jar ?
      I'm aware I can change it after build, but maybe there is already some knob in your gradle config?

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-270557996


      anshuGithubData wrote on 2017-02-01T13:05:23Z : Hello everyone,
      I was also facing the similar issues.
      As Woisu mentioned, I have tried to follow the steps as below.

      (1)
      When I do assemble with following command **./gradlew -PhadoopVersion=2.6.0-cdh5.9.0 -q clean assemble **, it fails with following error
      Task failed with an exception.
      -----------

      • What went wrong:
        Execution failed for task ':gobblin-api:javadoc'.
        > Javadoc generation failed. Generated Javadoc options file (useful for troubleshooting): '/home/cdhuser/gobblin/build/gobblin-api/tmp/javadoc/javadoc.options'
      • Try:
        Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.
        ==============================================================================

      Task failed with an exception.
      -----------

      • What went wrong:
        Execution failed for task ':gobblin-rest-service:gobblin-rest-client:javadoc'.
        > Javadoc generation failed. Generated Javadoc options file (useful for troubleshooting): '/home/cdhuser/gobblin/build/gobblin-rest-client/tmp/javadoc/javadoc.options'

      (2)
      So I tried this one *./gradlew -x javadoc -PhadoopVersion=2.6.0-cdh5.9.0 -q clean assemble --stacktrace* and it was okay I believe, I get the below message.
      2 warnings
      Note: /home/cdhuser/gobblin/gobblin-modules/gobblin-couchbase/src/main/java/gobblin/couchbase/writer/CouchbaseWriter.java uses or overrides a deprecated API.
      Note: Recompile with -Xlint:deprecation for details.
      Note: /home/cdhuser/gobblin/gobblin-modules/gobblin-couchbase/src/main/java/gobblin/couchbase/writer/CouchbaseWriter.java uses unchecked or unsafe operations.
      Note: Recompile with -Xlint:unchecked for details.
      /home/cdhuser/gobblin/gobblin-runtime/src/main/java/gobblin/runtime/mapreduce/GobblinWorkUnitsInputFormat.java:124: warning: Generating equals/hashCode implementation but without a call to superclass, even though this class does not extend java.lang.Object. If this is intentional, add '@EqualsAndHashCode(callSuper=false)' to your type.
      @EqualsAndHashCode
      ^
      Note: Some input files use or override a deprecated API.
      Note: Recompile with -Xlint:deprecation for details.
      Note: /home/cdhuser/gobblin/gobblin-compaction/src/main/java/gobblin/compaction/mapreduce/avro/AvroKeyDedupReducer.java uses unchecked or unsafe operations.
      Note: Recompile with -Xlint:unchecked for details.
      Note: Some input files use or override a deprecated API.
      Note: Recompile with -Xlint:deprecation for details.
      Note: Some input files use or override a deprecated API.
      Note: Recompile with -Xlint:deprecation for details.
      Note: Some input files use unchecked or unsafe operations.
      Note: Recompile with -Xlint:unchecked for details.
      1 warning
      Note: Some input files use unchecked or unsafe operations.
      Note: Recompile with -Xlint:unchecked for details.
      Note: Some input files use or override a deprecated API.
      Note: Recompile with -Xlint:deprecation for details.
      Note: /home/cdhuser/gobblin/gobblin-cluster/src/main/java/gobblin/cluster/GobblinHelixTaskStateTracker.java uses or overrides a deprecated API.
      Note: Recompile with -Xlint:deprecation for details.
      Note: /home/cdhuser/gobblin/gobblin-cluster/src/main/java/gobblin/cluster/GobblinHelixJob.java uses unchecked or unsafe operations.
      Note: Recompile with -Xlint:unchecked for details.
      Note: /home/cdhuser/gobblin/gobblin-aws/src/main/java/gobblin/aws/GobblinAWSClusterLauncher.java uses or overrides a deprecated API.
      Note: Recompile with -Xlint:deprecation for details.
      Note: /home/cdhuser/gobblin/gobblin-modules/gobblin-helix/src/main/java/gobblin/runtime/ZkDatasetStateStore.java uses unchecked or unsafe operations.
      Note: Recompile with -Xlint:unchecked for details.
      Note: /home/cdhuser/gobblin/gobblin-modules/gobblin-azkaban/src/main/java/gobblin/azkaban/AzkabanIntegrationTestLauncher.java uses unchecked or unsafe operations.
      Note: Recompile with -Xlint:unchecked for details.

      (3)
      Then I am executing the following command
      LPATH=/home/cdhuser/gobblin/build/gobblin-distribution/distributions/gobblin-dist/lib
      BUILD=0.9.0-120-g75ebc38
      ./bin/gobblin-mapreduce.sh \
      --jt http://(IP where RM is running):8032 \
      --conf confGobblinKafkaTestJobs/kafkatohdfs.pull \
      -jars $LPATH/guava-retrying-2.0.0.jar,$LPATH/kafka-avro-serializer-2.0.1.jar,$LPATH/kafka-json-serializer-2.0.1.jar,$LPATH/hadoop-common-2.6.0-cdh5.9.0.jar,$LPATH/gobblin-metrics-base$BUILD.jar,$LPATH/gobblin-metrics-$BUILD.jar,$LPATH/gobblin-core-base-$BUILD.jar,$LPATH/gobblin-kafka-08-$BUILD.jar,$LPATH/gobblin-kafka-common-$BUILD.jar

      It is failing with this error *Error: java.lang.ClassNotFoundException: org.reflections.Reflections*.
      So I have added $LPATH/javassist-3.18.2-GA.jar, but same error.

      Any help would be really appreciated!
      Thanks,
      Anshu

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-276652455


      anshuGithubData wrote on 2017-02-01T13:44:23Z : @wosiu @panagiotious @shirshanka If you guys can please have a look into the issue I am facing (described above) and help me out that would be great!
      Thanks
      Anshu

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-276660177


      wosiu wrote on 2017-02-14T23:34:41Z : @anshuGithubData
      Yes, after upgrading to the current master (commit: a1b9bf579fb6) I've got the same
      `It is failing with this error Error: java.lang.ClassNotFoundException: org.reflections.Reflections.`
      Appending following to --jars flag did the thing for me:

      $LPATH/reflections-0.9.10.jar,$LPATH/javassist-3.18.2-GA.jar,$LPATH/opencsv-3.8.jar

      Also it seems that authors advice to use gobblin.sh instead of gobblin-mapreduce.sh Although I still use gobblin-mapreduce.sh as it still works fine for me.

      Github Url : https://github.com/linkedin/gobblin/issues/1483#issuecomment-279871219

      Attachments

        Activity

          People

            agepati Raul A
            wosiu Michał Woś
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: