Details
-
Bug
-
Status: Open
-
P3
-
Resolution: Unresolved
-
2.29.0, 2.33.0
-
None
-
None
Description
My Dataflow copies a SQL table with 230M rows into Cloud Spanner. The initial run is successful, but any subsequent run fails with this error. "h1.google.api_core.exceptions.NotFound: 404 Session not found"
and also "504 Deadline Exceeded"
Here is part of the code:
SPANNER_QUERY = 'SELECT row_id, update_key FROM DomainsCluster2' spanner_domains = ( p | 'ReadFromSpanner' >> ReadFromSpanner( project_id, database, database, sql=SPANNER_QUERY) | 'KeyDomainsSpanner' >> beam.Map(_KeyDomainSpanner)) def _KeyDomainSpanner(entity): row = {} for i, column in enumerate(['row_id', 'update_key']): row[column] = entity[i] return row['row_id'], row
The Dataflow job is able to read around 10M rows with 2.29.0 but only a few thousand with 2.33.0