Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-11230

ReadFromBigQuery fails when the table has repeated records

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • 2.25.0
    • 2.27.0
    • sdk-py-core
    • None

    Description

      This is pretty much similar to the issue mentioned here: https://issues.apache.org/jira/browse/BEAM-10524

      I've upgraded the python sdk version from 2.24 to 2.25 and the ReadFromBigQuery start failing with this stacktrace:

       

      ....
      
      "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
          work_executor.execute()
        File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 179, in execute
          op.start()
        File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
        File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
        File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
        File "dataflow_worker/native_operations.py", line 48, in dataflow_worker.native_operations.NativeReadOperation.start
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py", line 89, in read
          range_tracker.sub_range_tracker(source_ix)):
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", line 210, in read_records
          yield self._coder.decode(record)
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 633, in decode
          return self._decode_with_schema(value, self.fields)
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 656, in _decode_with_schema
          value[field.name] = converter(value[field.name])
      TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

      According to the aforementioned issue, this should be fixed on the 2.25 but it is actually the opposite in my case. 

      Code: https://github.com/apache/beam/blob/release-2.25.0/sdks/python/apache_beam/io/gcp/bigquery.py#L656

       

      Attachments

        Issue Links

          Activity

            People

              kamilwu Kamil Wasilewski
              Gomez Covella Alvaro
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m