Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-5224

gcs_to_bq.GoogleCloudStorageToBigQueryOperator - Specify Encoding for BQ ingestion

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Blocker
    • Resolution: Unresolved
    • Affects Version/s: 1.10.0
    • Fix Version/s: None
    • Component/s: DAG, gcp
    • Labels:
      None
    • Environment:
      airflow software platform

      Description

      Hi,

      The current business project we are enabling has been built completely on GCP components with composer with airflow being one of the key process. We have built various data pipelines using airflow for multiple work-streams where data is being ingested from gcs bucket to Big query.

      Based on the recent updates on Google BQ infra end, there seems to be some tightened validations on UTF-8 characters which has resulted in mutiple failures of our existing business process.

      On further analysis we found out that while ingesting data to BQ from a Google bucket the encoding needs to be explicitly specified going forward but the below operator currently doesn't  supply any params to specify explicit encoding

      gcs_to_bq.GoogleCloudStorageToBigQueryOperator

       Could someone please treat this as a priority and help us with a fix to bring us back in BAU mode

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              anandkumar_b Anand Kumar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: