Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12773

404 Session not found, when querying Google Cloud Spanner with Python Dataflow.

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • 2.29.0, 2.33.0
    • None
    • io-py-gcp
    • None

    Description

      My Dataflow copies a SQL table with 230M rows into Cloud Spanner. The initial run is successful, but any subsequent run fails with this error. "h1.google.api_core.exceptions.NotFound: 404 Session not found"
      and also "504 Deadline Exceeded"

      Here is part of the code:

      
      SPANNER_QUERY = 'SELECT row_id, update_key FROM DomainsCluster2'
      
      spanner_domains = (
            p
            | 'ReadFromSpanner' >> ReadFromSpanner(
                project_id, database, database, sql=SPANNER_QUERY)
            | 'KeyDomainsSpanner' >> beam.Map(_KeyDomainSpanner))
      
      def _KeyDomainSpanner(entity):
        row = {}
        for i, column in enumerate(['row_id', 'update_key']):
          row[column] = entity[i]
        return row['row_id'], row
      
      

      The Dataflow job is able to read around 10M rows with 2.29.0 but only a few thousand with 2.33.0

      Attachments

        1. dataflow_inprogress_2.29.0.png
          352 kB
          Reto Egeter
        2. dataflow_spanner_error_2.29.0.png
          479 kB
          Reto Egeter
        3. dataflow_spanner_error_2.33.0.png
          473 kB
          Reto Egeter

        Activity

          People

            Unassigned Unassigned
            regeter Reto Egeter
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: