Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-11230

ReadFromBigQuery fails when the table has repeated records

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P2
    • Resolution: Fixed
    • Affects Version/s: 2.25.0
    • Fix Version/s: 2.27.0
    • Component/s: sdk-py-core
    • Labels:
      None

      Description

      This is pretty much similar to the issue mentioned here: https://issues.apache.org/jira/browse/BEAM-10524

      I've upgraded the python sdk version from 2.24 to 2.25 and the ReadFromBigQuery start failing with this stacktrace:

       

      ....
      
      "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
          work_executor.execute()
        File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 179, in execute
          op.start()
        File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
        File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
        File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
        File "dataflow_worker/native_operations.py", line 48, in dataflow_worker.native_operations.NativeReadOperation.start
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py", line 89, in read
          range_tracker.sub_range_tracker(source_ix)):
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", line 210, in read_records
          yield self._coder.decode(record)
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 633, in decode
          return self._decode_with_schema(value, self.fields)
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 656, in _decode_with_schema
          value[field.name] = converter(value[field.name])
      TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

      According to the aforementioned issue, this should be fixed on the 2.25 but it is actually the opposite in my case. 

      Code: https://github.com/apache/beam/blob/release-2.25.0/sdks/python/apache_beam/io/gcp/bigquery.py#L656

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kamilwu Kamil Wasilewski
                Reporter:
                Gomez Covella Alvaro
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m