Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10113

PubSubIO readMessagesWithMessageId() breaks the payload encoding when using DataflowRunner

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • 2.20.0
    • Not applicable
    • io-java-gcp
    • Running locally on MacOS Catalina 10.15.4 and as a Dataflow job in GCP.

    Description

      My pipeline reads PubSub messages and parses their payload to objects using Gson. I use PubsubIO.readMessagesWithMessageId() to get the PubSub message and the message ID.

      I tested the pipeline thoroughly by running it with the DirectRunner in my local machine and everything works fine, but when running it as a Dataflow job in GCP using the DataflowRunner, Gson can't parse the messages properly because the first character of the payload (opening bracket "{") is missing, this only happens when using the DataflowRunner. 

      I noticed that the problem no longer happens when using PubsubIO.readStrings() instead of PubsubIO.readMessagesWithMessageId() and getting the payload directly (previously I had to decode the payload using new String(element.getPayload(), StandardCharsets.UTF_8); )

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            alexandermalyga Alexander Malyga
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: