Currently, JSON format is used for file loads into BigQuery in the Python SDK. JSON has some disadvantages including size of serialized data and inability to represent NaN and infinity float values.
BigQuery supports loading files in avro format, which can overcome these disadvantages. The Java SDK already supports loading files using avro format (
BEAM-2879) so it makes sense to support it in the Python SDK as well.
The change will be somewhere around BigQueryBatchFileLoads.