Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-484

Propagate fork exception to task commit

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.13.0
    • None
    • None

    Description

      >>> Today if exception occurred in task level, we will not propagate this exception to the commit phase, which means in fork.commit, we will see some exceptions like this :

      2018/04/30 08:03:19.369 ERROR [Task] [Task-committing-pool-0] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed
      org.apache.gobblin.runtime.ForkException: Fork branches [0] failed for task task_DYNAMICS-CONTACT-438563007_1525075320170_0
      at org.apache.gobblin.runtime.Task.commit(Task.java:884)
      at org.apache.gobblin.runtime.GobblinMultiTaskAttempt$1$1.call(GobblinMultiTaskAttempt.java:167)
      at org.apache.gobblin.runtime.GobblinMultiTaskAttempt$1$1.call(GobblinMultiTaskAttempt.java:162)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      >>> However the root cause of exception happened earlier before the commit phase, which is in the task run() stage, some records failed to process:

      2018/04/30 08:03:19.352 ERROR [Task] [TaskExecutor-1] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Processing record incurs an unexpected exception:
      java.lang.IllegalStateException: Fork 0 of task task_DYNAMICS-CONTACT-438563007_1525075320170_0 has failed and is no longer running
      at org.apache.gobblin.runtime.fork.Fork.putRecord(Fork.java:285)
      at org.apache.gobblin.runtime.Task.processRecord(Task.java:778)
      at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:459)
      at org.apache.gobblin.runtime.Task.run(Task.java:341)
      at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
      at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      2018/04/30 08:03:19.353 ERROR [Task] [TaskExecutor-1] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed
      java.lang.RuntimeException
      at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:464)
      at org.apache.gobblin.runtime.Task.run(Task.java:341)
      at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
      at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      2018/04/30 08:03:19.368 INFO [com_2792] [TaskState

      >>> Now further look into the problem, we know it is due to the record processing timeout from espresso writer:

      2018/04/30 08:03:19.348 ERROR [Fork-0] [ForkExecutor-0] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Fork 0 of task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed to process data records
      java.io.IOException: java.util.concurrent.ExecutionException: org.apache.gobblin.exception.NonTransientException: Irrecoverable failure on async write
      at org.apache.gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:143)
      at org.apache.gobblin.writer.RetryWriter.writeEnvelope(RetryWriter.java:123)
      at org.apache.gobblin.runtime.fork.Fork.processRecord(Fork.java:492)
      at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:103)
      at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:86)
      at org.apache.gobblin.runtime.fork.Fork.run(Fork.java:238)
      at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.util.concurrent.ExecutionException: org.apache.gobblin.exception.NonTransientException: Irrecoverable failure on async write
      at ligobblin.shaded.com.github.rholder.retry.Retryer$ExceptionAttempt.<init>(Retryer.java:254)
      at ligobblin.shaded.com.github.rholder.retry.Retryer.call(Retryer.java:163)
      at ligobblin.shaded.com.github.rholder.retry.Retryer$RetryerCallable.call(Retryer.java:318)
      at org.apache.gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:141)
      ... 11 more
      Caused by: org.apache.gobblin.exception.NonTransientException: Irrecoverable failure on async write
      at org.apache.gobblin.writer.AsyncWriterManager.maybeThrow(AsyncWriterManager.java:309)
      at org.apache.gobblin.writer.AsyncWriterManager.write(AsyncWriterManager.java:271)
      at org.apache.gobblin.writer.AsyncWriterManager.writeEnvelope(AsyncWriterManager.java:259)
      at org.apache.gobblin.writer.CloseOnFlushWriterWrapper.writeEnvelope(CloseOnFlushWriterWrapper.java:93)
      at org.apache.gobblin.instrumented.writer.InstrumentedDataWriterDecorator.writeEnvelope(InstrumentedDataWriterDecorator.java:75)
      at org.apache.gobblin.writer.PartitionedDataWriter.writeEnvelope(PartitionedDataWriter.java:161)
      at org.apache.gobblin.writer.ThrottleWriter.writeEnvelope(ThrottleWriter.java:131)
      at org.apache.gobblin.writer.RetryWriter$2.call(RetryWriter.java:118)
      at org.apache.gobblin.writer.RetryWriter$2.call(RetryWriter.java:115)
      at ligobblin.shaded.com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
      at ligobblin.shaded.com.github.rholder.retry.Retryer.call(Retryer.java:160)
      ... 13 more
      Caused by: java.lang.RuntimeException: java.io.IOException: java.util.concurrent.TimeoutException
      at org.apache.gobblin.proxies.EspressoProxy.getRecordsPerGetRequest(EspressoProxy.java:199)
      at org.apache.gobblin.proxies.EspressoProxy.get(EspressoProxy.java:216)
      at org.apache.gobblin.writer.http.espresso.EspressoWriter.changeExist(EspressoWriter.java:81)
      at org.apache.gobblin.writer.http.espresso.EspressoMultiputWriter$1.call(EspressoMultiputWriter.java:89)
      at org.apache.gobblin.writer.http.espresso.EspressoMultiputWriter$1.call(EspressoMultiputWriter.java:86)
      ... 4 more
      Caused by: java.io.IOException: java.util.concurrent.TimeoutException
      at com.linkedin.espresso.client.r2d2impl.R2D2EspressoClient.execute(R2D2EspressoClient.java:560)
      at org.apache.gobblin.proxies.EspressoProxy.getRecordsPerGetRequest(EspressoProxy.java:162)
      ... 8 more
       

      Attachments

        Activity

          People

            yukuai518 Kuai Yu
            yukuai518 Kuai Yu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: