Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-323

Job run in the cluster mode failed due to incorrect inference of state store URI from the default appWorkDir



    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: gobblin-cluster
    • Labels:



      When running the cluster using the default launch script, a simple helloworld job would fail with the following:

      2017-11-28 15:36:42 PST ERROR [JobScheduler-0] org.apache.gobblin.cluster.GobblinHelixJobScheduler$NonScheduledJobRunner - Failed to run job HelloWorld
      org.apache.gobblin.runtime.JobException: Failed to run job HelloWorld
      at org.apache.gobblin.cluster.GobblinHelixJobScheduler.runJob(GobblinHelixJobScheduler.java:117)
      at org.apache.gobblin.cluster.GobblinHelixJobScheduler$NonScheduledJobRunner.run(GobblinHelixJobScheduler.java:190)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 5: file:
      at java.net.URI$Parser.fail(URI.java:2848)
      at java.net.URI$Parser.failExpecting(URI.java:2854)
      at java.net.URI$Parser.parse(URI.java:3057)
      at java.net.URI.<init>(URI.java:673)
      at org.apache.gobblin.cluster.GobblinHelixJobLauncher.<init>(GobblinHelixJobLauncher.java:156)
      at org.apache.gobblin.cluster.GobblinHelixJobScheduler.buildGobblinHelixJobLauncher(GobblinHelixJobScheduler.java:123)
      at org.apache.gobblin.cluster.GobblinHelixJobScheduler.runJob(GobblinHelixJobScheduler.java:114)
      ... 4 more



      When constructing URI objects, passing null as the path without a host name and port will result in an error. This is the case with the default appWorkDir e.g. file:/Users/username/standalone_cluster/1
      even a config like file:///Users/username/standalone_cluster/1 will have the same issue.

      The following unit tests demonstrate the issue.

      package org.apache.gobblin.cluster;

      import java.net.URI;
      import java.net.URISyntaxException;
      import org.junit.Test;

      public class UriTest {

      public void construct_uri_with_schema_only_throws_exception() throws URISyntaxException

      { // given // when new URI("file", null, null, -1, null, null, null); // then }

      public void construct_uri_with_schema_and_hostname() throws URISyntaxException

      { // given // when URI uri = new URI("file", null, "localhost", 80, null, null, null); // then System.out.println(uri); }

      public void construct_uri_with_schema_and_path() throws URISyntaxException

      { // given // when URI uri = new URI("file", null, null, -1, "///", null, null); // then System.out.println(uri); }

      public void construct_uri_with_string() throws URISyntaxException

      { // given // when URI uri = new URI("file:/"); // then System.out.println(uri); }


      This issue appears to be introduced by


      A potential fix is to pass "/" to the path parameter
      new URI(appWorkDir.toUri().getScheme(), null, appWorkDir.toUri().getHost(),
      appWorkDir.toUri().getPort(), "/", null, null).toString()));
      It seems to work for the default case. I am not sure if will work for all cases though.

      Can this logic be removed? Doesn't setting the proper fs config achieve the same goal?


          Issue Links



              • Assignee:
                hutran Hung Tran
                HappyRay Ray Yang
              • Votes:
                0 Vote for this issue
                1 Start watching this issue


                • Created: