Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8965

WriteToBigQuery failed in BundleBasedDirectRunner

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • 2.16.0, 2.17.0, 2.18.0, 2.19.0
    • 2.20.0
    • io-py-gcp
    • None

    Description

      WriteToBigQuery fails in BundleBasedDirectRunner with error PCollection of size 2 with more than one element accessed as a singleton view.

      Here is the code

       

      with Pipeline() as p:
          query_results = (
              p 
              | beam.io.Read(beam.io.BigQuerySource(
                  query='SELECT ... FROM ...')
              )
          query_results | beam.io.gcp.WriteToBigQuery(
                  table=<your_table_name>,
                  method=WriteToBigQuery.Method.FILE_LOADS,
                  schema={"fields": []}
              )
      

       

      Here is the error

       

        File "apache_beam/runners/common.py", line 778, in apache_beam.runners.common.DoFnRunner.process
          def process(self, windowed_value):
        File "apache_beam/runners/common.py", line 782, in apache_beam.runners.common.DoFnRunner.process
          self._reraise_augmented(exn)
        File "apache_beam/runners/common.py", line 849, in apache_beam.runners.common.DoFnRunner._reraise_augmented
          raise_with_traceback(new_exn)
        File "apache_beam/runners/common.py", line 780, in apache_beam.runners.common.DoFnRunner.process
          return self.do_fn_invoker.invoke_process(windowed_value)
        File "apache_beam/runners/common.py", line 587, in apache_beam.runners.common.PerWindowInvoker.invoke_process
          self._invoke_process_per_window(
        File "apache_beam/runners/common.py", line 610, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
          [si[global_window] for si in self.side_inputs]))
        File "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/transforms/sideinputs.py", line 65, in __getitem__
          _FilteringIterable(self._iterable, target_window), self._view_options)
        File "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/pvalue.py", line 443, in _from_runtime_iterable
          len(head), str(head[0]), str(head[1])))
      ValueError: PCollection of size 2 with more than one element accessed as a singleton view. First two elements encountered are "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f", "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f". [while running 'WriteToBigQuery/BigQueryBatchFileLoads/ParDo(WriteRecordsToFile)/ParDo(WriteRecordsToFile)']
      

       

       

       

       

      Attachments

        Issue Links

          Activity

            People

              wenbing-bai Wenbing Bai
              wenbing-bai Wenbing Bai
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h