Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
>>> Today if exception occurred in task level, we will not propagate this exception to the commit phase, which means in fork.commit, we will see some exceptions like this :
2018/04/30 08:03:19.369 ERROR [Task] [Task-committing-pool-0] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed
org.apache.gobblin.runtime.ForkException: Fork branches [0] failed for task task_DYNAMICS-CONTACT-438563007_1525075320170_0
at org.apache.gobblin.runtime.Task.commit(Task.java:884)
at org.apache.gobblin.runtime.GobblinMultiTaskAttempt$1$1.call(GobblinMultiTaskAttempt.java:167)
at org.apache.gobblin.runtime.GobblinMultiTaskAttempt$1$1.call(GobblinMultiTaskAttempt.java:162)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
>>> However the root cause of exception happened earlier before the commit phase, which is in the task run() stage, some records failed to process:
2018/04/30 08:03:19.352 ERROR [Task] [TaskExecutor-1] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Processing record incurs an unexpected exception:
java.lang.IllegalStateException: Fork 0 of task task_DYNAMICS-CONTACT-438563007_1525075320170_0 has failed and is no longer running
at org.apache.gobblin.runtime.fork.Fork.putRecord(Fork.java:285)
at org.apache.gobblin.runtime.Task.processRecord(Task.java:778)
at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:459)
at org.apache.gobblin.runtime.Task.run(Task.java:341)
at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018/04/30 08:03:19.353 ERROR [Task] [TaskExecutor-1] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed
java.lang.RuntimeException
at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:464)
at org.apache.gobblin.runtime.Task.run(Task.java:341)
at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018/04/30 08:03:19.368 INFO [com_2792] [TaskState
>>> Now further look into the problem, we know it is due to the record processing timeout from espresso writer:
2018/04/30 08:03:19.348 ERROR [Fork-0] [ForkExecutor-0] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Fork 0 of task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed to process data records
java.io.IOException: java.util.concurrent.ExecutionException: org.apache.gobblin.exception.NonTransientException: Irrecoverable failure on async write
at org.apache.gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:143)
at org.apache.gobblin.writer.RetryWriter.writeEnvelope(RetryWriter.java:123)
at org.apache.gobblin.runtime.fork.Fork.processRecord(Fork.java:492)
at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:103)
at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:86)
at org.apache.gobblin.runtime.fork.Fork.run(Fork.java:238)
at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: org.apache.gobblin.exception.NonTransientException: Irrecoverable failure on async write
at ligobblin.shaded.com.github.rholder.retry.Retryer$ExceptionAttempt.<init>(Retryer.java:254)
at ligobblin.shaded.com.github.rholder.retry.Retryer.call(Retryer.java:163)
at ligobblin.shaded.com.github.rholder.retry.Retryer$RetryerCallable.call(Retryer.java:318)
at org.apache.gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:141)
... 11 more
Caused by: org.apache.gobblin.exception.NonTransientException: Irrecoverable failure on async write
at org.apache.gobblin.writer.AsyncWriterManager.maybeThrow(AsyncWriterManager.java:309)
at org.apache.gobblin.writer.AsyncWriterManager.write(AsyncWriterManager.java:271)
at org.apache.gobblin.writer.AsyncWriterManager.writeEnvelope(AsyncWriterManager.java:259)
at org.apache.gobblin.writer.CloseOnFlushWriterWrapper.writeEnvelope(CloseOnFlushWriterWrapper.java:93)
at org.apache.gobblin.instrumented.writer.InstrumentedDataWriterDecorator.writeEnvelope(InstrumentedDataWriterDecorator.java:75)
at org.apache.gobblin.writer.PartitionedDataWriter.writeEnvelope(PartitionedDataWriter.java:161)
at org.apache.gobblin.writer.ThrottleWriter.writeEnvelope(ThrottleWriter.java:131)
at org.apache.gobblin.writer.RetryWriter$2.call(RetryWriter.java:118)
at org.apache.gobblin.writer.RetryWriter$2.call(RetryWriter.java:115)
at ligobblin.shaded.com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at ligobblin.shaded.com.github.rholder.retry.Retryer.call(Retryer.java:160)
... 13 more
Caused by: java.lang.RuntimeException: java.io.IOException: java.util.concurrent.TimeoutException
at org.apache.gobblin.proxies.EspressoProxy.getRecordsPerGetRequest(EspressoProxy.java:199)
at org.apache.gobblin.proxies.EspressoProxy.get(EspressoProxy.java:216)
at org.apache.gobblin.writer.http.espresso.EspressoWriter.changeExist(EspressoWriter.java:81)
at org.apache.gobblin.writer.http.espresso.EspressoMultiputWriter$1.call(EspressoMultiputWriter.java:89)
at org.apache.gobblin.writer.http.espresso.EspressoMultiputWriter$1.call(EspressoMultiputWriter.java:86)
... 4 more
Caused by: java.io.IOException: java.util.concurrent.TimeoutException
at com.linkedin.espresso.client.r2d2impl.R2D2EspressoClient.execute(R2D2EspressoClient.java:560)
at org.apache.gobblin.proxies.EspressoProxy.getRecordsPerGetRequest(EspressoProxy.java:162)
... 8 more