Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-2842

GCS rsync operator

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: gcp
    • Labels:

      Description

      The GoogleCloudStorageToGoogleCloudStorageOperator supports copying objects from one bucket to another using a wildcard.

      As long you don't delete anything in the source bucket, the destination bucket will end up synchronized on every run.

      However, each object gets copied over even if it exists at the destination, which makes this operation inefficient, time-consuming, and potentially costly.

      I'd love an operator that behaves like `gsutil rsync` for when I need to synchronize two buckets, supporting `gsutil rsync -d` behavior as well.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                vikramo Vikram Oberoi
              • Votes:
                2 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated: