Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8841

Add ability to perform BigQuery file loads using avro

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.21.0
    • Component/s: io-py-gcp
    • Labels:
      None

      Description

      Currently, JSON format is used for file loads into BigQuery in the Python SDK. JSON has some disadvantages including size of serialized data and inability to represent NaN and infinity float values.

      BigQuery supports loading files in avro format, which can overcome these disadvantages. The Java SDK already supports loading files using avro format (BEAM-2879) so it makes sense to support it in the Python SDK as well.

      The change will be somewhere around BigQueryBatchFileLoads.

        Attachments

          Activity

            People

            • Assignee:
              cccyang Chun Yang
              Reporter:
              cccyang Chun Yang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 9h 40m
                9h 40m