Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-7118

Honor schema type for mysql to gcs data pre-process

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.10.5, 1.10.6, 1.10.7, 1.10.8, 1.10.9
    • Fix Version/s: None
    • Component/s: operators
    • Labels:
      None
    • Flags:
      Patch

      Description

      After AIRFLOW-7117 is done, we re-gain the ability to override (BigQuery) schema with customize self.schema(if set). However, we still pre-process sql value based on default schema rather than override schema

      For example, mysql_to_gcs default to map (mysql)`FIELD_TYPE.DATE` to (bigquery)`TIMESTAMP`. Suppose we want to load such field to BigQuery as `DATE`, we would provide our own schema. Unfortunately, since convert_type() still pre-process sql value based on default schema, (python)`date` will be converted timestamp|https://github.com/stoynov96/airflow/blob/master/airflow/operators/mysql_to_gcs.py#L118-L119]. As a result, BigQuery API returns error:

      Could not convert non-string JSON value to DATE type

      To fix that, we need to update convert_type() to always honor `schema_type` for data pre-preprocess. e.g. DATE

      if schema_type == "DATE" and isinstance(value, date):
          # In format of 'YYYY-[M]M-[D]D'
          return value.isoformat()

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                whynick1 Hongyi Wang
                Reporter:
                whynick1 Hongyi Wang
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: