Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12669

UpdateDestinationSchema PTransform does not respect source format

Details

    • Bug
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • 2.30.0
    • 2.33.0
    • io-go-gcp, runner-dataflow
    • None

    Description

      When multiple load jobs are needed to write data to a destination table, e.g., when the data is spread over more than 10,000 URIs, WriteToBigQuery in FILE_LOADS mode will write data into temporary tables and then update the temporary tables if schema additions is allowed.

      However, update of temporary table scheme does not respect a specified source format of the loading files(i.e. JSON, AVRO).

      The UpdateDestinationSchema issue schema modification command with a default CSV setting which causing AVRO or JSON nested schema loads to fail with the error:

      apache_beam.io.gcp.bigquery_file_loads: INFO: Triggering schema modification job beam_bq_job_LOAD_satybald7_SCHEMA_MOD_STEP_994_3869e4dc1dd08c68d20fd047e242161a_7c553f684cce4963a75d669f38a4ec46 on <TableReference
       datasetId: 'python_write_to_table_1627431111435'
       projectId: 'DELETED'
       tableId: 'python_append_schema_update'>
      apache_beam.io.gcp.bigquery_tools: INFO: Failed to insert job <JobReference
       jobId: 'beam_bq_job_LOAD7_SCHEMA_MOD_STEP_994_3869e4dc1dd08c68d20fd047e242161a_7c553f684cce4963a75d669f38a4ec46'
       projectId: 'DELETED'>: HttpError accessing ....
      
       'content-type': 'application/json; charset=UTF-8', 'content-length': '332', 'date': 'Wed, 28 Jul 2021 00:12:03 GMT', 'server': 'UploadServer', 'status': '400'}>, content <{
        "error": {
          "code": 400,
          "message": "Cannot load CSV data with a nested schema. Field: nested_field",
          "errors": [
            {
              "message": "Cannot load CSV data with a nested schema. Field: nested_field",
              "domain": "global",
              "reason": "invalid"
            }
          ],
          "status": "INVALID_ARGUMENT"
        }
      }
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sayatez Sayat Satybaldiyev
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h