Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-6313

PutGCSObject performance seems slow

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.9.2
    • None
    • Core Framework, Extensions
    • None

    Description

      The PutGCSObject processor to transfer files to Google Cloud Platform bucket has bad transfer speeds.

      It is impossible to put any hard figures on the throughput as it seems dependent on:

      -Network location of the Nifi node (situated in GC or not)

      -Network bandwidth

      -Nifi node specs

       

      After performing benchmarks on multiple Nifi clusters (ranging from test setups to prod. sites) the throughput can range from 8MB/s to 800MB/s. 

      Slow really means, slow in comparison to gsutil. If you run gsutil directly from the Nifi node the throughput speed goes up 5 to 8 times (without 'parallel_composite_upload') and up to 16 times faster with 'parallel_composite_upload'.

       

      The GC Java API on which Nifi's GCS processors are built, does not have the same optimizations as gsutil and maybe isn't supported/maintained. The Storage.create method is even deprecated.

      Still there must be ways to speed up transfers the GCS by implementing parallel composite uploads in chuncks and config options on the GCS processors 

      Attachments

        Activity

          People

            Unassigned Unassigned
            jasperknulst Jasper Knulst
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: