Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12573

Bounded Source Reader DoFn PR #13154 causes some pipelines to fail with a PicklingError

Details

    • Bug
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • None
    • 2.33.0
    • io-py-common
    • None

    Description

      As reported in [1], when trying to run the coders.py cookbook example [2], it raises an
      exception

      ```
      PicklingError : _pickle.PicklingError: Can't pickle <class
      '_main.JsonCoder'>: it's not the same object as __main_.JsonCoder
      [while running 'read/Read/Map(<lambda at iobase.py:899>)']
      ```

      To reproduce the issue, run:
      ```
      python coders.py --input input.ndjson --output output.txt
      ```

      Where the input file (input.ndjson) has the same values as coders_test.py:
      ```

      {"host": ["Germany", 1], "guest": ["Italy", 0]} {"host": ["Germany", 1], "guest": ["Brasil", 3]} {"host": ["Brasil", 1], "guest": ["Italy", 0]}

      ```

      I have tracked the change in behavior to[3] with git bisect. This needs further investigation to understand the issue.

      [1] https://lists.apache.org/thread.html/r45340bbee91a6caf798fe62d24388f645f8792cc7506351fd66adec3%40%3Cdev.beam.apache.org%3E
      [2] https://github.com/apache/beam/blob/35bac6a62f1dc548ee908cfeff7f73ffcac38e6f/sdks/python/apache_beam/examples/cookbook/coders.py
      [3] https://github.com/apache/beam/pull/13154

      Attachments

        Issue Links

          Activity

            People

              tvalentyn Valentyn Tymofieiev
              tvalentyn Valentyn Tymofieiev
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m

                  Slack

                    Issue deployment