Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8841

Add ability to perform BigQuery file loads using avro

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.21.0
    • Component/s: io-py-gcp
    • Labels:
      None

      Description

      Currently, JSON format is used for file loads into BigQuery in the Python SDK. JSON has some disadvantages including size of serialized data and inability to represent NaN and infinity float values.

      BigQuery supports loading files in avro format, which can overcome these disadvantages. The Java SDK already supports loading files using avro format (BEAM-2879) so it makes sense to support it in the Python SDK as well.

      The change will be somewhere around BigQueryBatchFileLoads.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                cccyang Chun Yang
                Reporter:
                cccyang Chun Yang
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 8h
                  8h