Uploaded image for project: 'Camel'
  1. Camel
  2. CAMEL-13399

ZipAggregationStrategy become slower when size of zip grows

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.23.1
    • Fix Version/s: 3.0.0.RC1, 3.0.0
    • Component/s: camel-zipfile
    • Labels:
      None
    • Estimated Complexity:
      Unknown

      Description

      I have a simple route which runs by demand and archives multiple files in one zip archive.

      from(file:/path/to/source)
      .aggregate(constant(1), new ZipAggregationsStrategy(true, true))
      .completionFromBatchConsumer()
      .eagerCheckCompletion()
      .to(file:/path/to/target)

      It works fine when the number of files in source folder is relatively small.

      After adding tracing logs to test size of input files / time taken by process, the following chart could be drawn. 

      That means, to make zip archive from 500mb of files takes over 12 minutes!

      Looks like in order to add a file, camel extracts zip archive to input stream, put file inside it, and build zip archive again. So that becomes near quadratic complexity, and not acceptable for large folders.

      The workaround is to add completionSize or completionPredicate to flush every 100mb, so we got all files archived but splitted into several archives, which works but not the best choice.

       

      Is there a general solution how to make ZipAggregationStrategy to work in near linear time, so the process does not become slower with large number of files?

        Attachments

        1. Screenshot 2019-04-08 18.41.10.png
          26 kB
          Mykhailo Kozik

          Issue Links

            Activity

              People

              • Assignee:
                bedla Jan Bednar
                Reporter:
                mishadoff Mykhailo Kozik
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m