Details
-
Bug
-
Status: Triage Needed
-
P2
-
Resolution: Fixed
-
None
-
None
Description
From slack:
I am trying to run a pipeline (defined with the Python SDK) on Dataflow that uses beam.io.ReadFromMongoDB. When dealing with very small datasets (<10mb) it runs fine, when trying to run it with slightly larger datasets (70mb), I always get this error:
TypeError: '<' not supported between instances of 'dict' and 'ObjectId'
Stack trace see below. Running it on a local machine works just fine. I would highly appreciate any pointers what this could be.
I hope this is the right channel do address this.
Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work work_executor.execute() File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 218, in execute self._split_task) File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 226, in _perform_source_split_considering_api_limits desired_bundle_size) File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 263, in _perform_source_split for split in source.split(desired_bundle_size): File "/usr/local/lib/python3.7/site-packages/apache_beam/io/mongodbio.py", line 174, in split bundle_end = min(stop_position, split_key_id) TypeError: '<' not supported between instances of 'dict' and 'ObjectId'