Details
-
Bug
-
Status: Resolved
-
P0
-
Resolution: Fixed
-
None
Description
Several streaming jobs are failing without having done any work.
Seeing in INFO-level logs under `dataflow.googleapis.com/harness` logs:
Streaming engine endpoint to ipv4:209.85.200.95:443 closed unexpectedly with error code, NOT_FOUND, and will be retried if necessary. This may occur due to autoscaling events. Full status: NOT_FOUND: Requested entity was not found. === Source Location Trace: === ./third_party/grpc/google_specific/include/grpcpp/impl/codegen/status.h:97
Java tests are ending with the following error messages:
java.lang.RuntimeException: Dataflow job ... terminated in state RUNNING but did not return a failure reason.
Python jobs have errors like:
ERROR apache_beam.io.gcp.tests.pubsub_matcher:pubsub_matcher.py:162 Timeout after 400 sec. Received 0 messages from projects/apache-beam-testing/subscriptions/wc_subscription_output39860e7e-f237-41fb-9f6b-4c2bd4783023.
This appears to be affecting many jobs:
https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2_Streaming/1044/
- Appears to be timing out/crashing, but ran a test locally to confirm:
- Job: https://console.cloud.google.com/dataflow/jobs/us-central1/2021-07-30_15_29_50-2692669480703228454?project=apache-beam-testing
Java PreCommit streaming test org.apache.beam.examples.WordCountIT.testE2EWordCount:
- https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/3988/
- https://gradle.com/s/pha5hds4halho
- Job: https://console.cloud.google.com/dataflow/jobs/us-central1/2021-07-30_15_29_50-2692669480703228454?project=apache-beam-testing
Python PreCommits (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it)
- https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/4485/ (first cron failure)
- https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/2680/
- Scans: https://scans.gradle.com/s/l4s4az3rolina, https://gradle.com/s/zyzqaxlivyc7m
- Job: https://console.cloud.google.com/dataflow/jobs/us-central1/2021-07-30_11_39_34-1628210491567694938?project=apache-beam-testing
I highly suspect https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/2292 is also failing because of this (timing matches and only :validatesRunnerStreamingTests is failing, not :validatesRunnerBatchTests) but the daemon is crashing and I have yet to confirm locally.
Attachments
Issue Links
- duplicates
-
BEAM-12676 StreamingWordCountIT is failing for Python PreCommit
- Resolved
-
BEAM-12695 WordCountIT (streaming) failing for PreCommit_Java_Examples_Dataflow
- Resolved
- is duplicated by
-
BEAM-12676 StreamingWordCountIT is failing for Python PreCommit
- Resolved
-
BEAM-12713 beam_PostCommit_NightlySnapshot runMobileGamingJavaDataflow failing
- Resolved
-
BEAM-12723 beam_PreCommit_Java_Examples_Dataflow_Phrase failing due to key negotiation error
- Resolved
- is related to
-
BEAM-12713 beam_PostCommit_NightlySnapshot runMobileGamingJavaDataflow failing
- Resolved