Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
A customer runs AWS Glue 4.0 streaming job (based on Spark 3.3.0) using Apache Hudi 0.14.0 libraries. The customer enabled Hudi CW metrics reporter in the job. It succeeded to publish metrics at first batch, however, after that it started to fail. Therefore there’s only one data sample published to CloudWatch metrics. The error stacktrace is as follows:
2023-11-09 15:59:17,775 ERROR [stream execution thread for [id = d31c62e2-e697-40b7-b6da-854cf9a8cb14, runId = 0f2325c6-83f4-4dfa-a849-3b3189423a9b]] cloudwatch.CloudWatchReporter (CloudWatchReporter.java:report(236)): Error reporting metrics to CloudWatch. The data in this CloudWatch request may have been discarded, and not made it to CloudWatch. java.util.concurrent.ExecutionException: org.apache.hudi.software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: event executor terminated at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_382] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) ~[?:1.8.0_382] at org.apache.hudi.aws.cloudwatch.CloudWatchReporter.report(CloudWatchReporter.java:234) ~[hudi-aws-bundle-0.14.0.jar:0.14.0] at org.apache.hudi.aws.cloudwatch.CloudWatchReporter.report(CloudWatchReporter.java:211) ~[hudi-aws-bundle-0.14.0.jar:0.14.0] at org.apache.hudi.com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:237) ~[hudi-aws-bundle-0.14.0.jar:0.14.0] at org.apache.hudi.metrics.cloudwatch.CloudWatchMetricsReporter.report(CloudWatchMetricsReporter.java:71) ~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0] at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_382] at org.apache.hudi.metrics.Metrics.shutdown(Metrics.java:116) ~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0] at java.util.HashMap$Values.forEach(HashMap.java:982) ~[?:1.8.0_382] at org.apache.hudi.metrics.Metrics.shutdownAllMetrics(Metrics.java:88) ~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0] at org.apache.hudi.HoodieSparkSqlWriter$.cleanup(HoodieSparkSqlWriter.scala:937) ~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0] at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:151) ~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0]
This error comes from AWS Java SDK v2:
Caused by: java.util.concurrent.RejectedExecutionException: event executor terminated at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:923) ~[netty-common-4.1.74.Final.jar:4.1.74.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:350) ~[netty-common-4.1.74.Final.jar:4.1.74.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:343) ~[netty-common-4.1.74.Final.jar:4.1.74.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:825) ~[netty-common-4.1.74.Final.jar:4.1.74.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.lazyExecute(SingleThreadEventExecutor.java:820) ~[netty-common-4.1.74.Final.jar:4.1.74.Final] at io.netty.util.concurrent.AbstractScheduledEventExecutor.schedule(AbstractScheduledEventExecutor.java:263) ~[netty-common-4.1.74.Final.jar:4.1.74.Final] at io.netty.util.concurrent.AbstractScheduledEventExecutor.schedule(AbstractScheduledEventExecutor.java:177) ~[netty-common-4.1.74.Final.jar:4.1.74.Final] at io.netty.util.concurrent.AbstractEventExecutorGroup.schedule(AbstractEventExecutorGroup.java:50) ~[netty-common-4.1.74.Final.jar:4.1.74.Final] at org.apache.hudi.software.amazon.awssdk.http.nio.netty.internal.DelegatingEventLoopGroup.schedule(DelegatingEventLoopGroup.java:153) ~[hudi-aws-bundle-0.14.0.jar:0.14.0]
I've observed the MetricsReporter is shutdown after the 1st batch, however, the MetricsReporter instance is reused in the subsequent batches and it fails to report metrics.
Given MetricsReporter is not implemented as reusable, we should avoid reusing them.
Attachments
Issue Links
- links to