Camel
  1. Camel
  2. CAMEL-3497

Splitter Component: Setting 'streaming="true" parallelProcessing="true"' consumes large amounts > of heap space for big original messages

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.5.0
    • Fix Version/s: 2.6.0
    • Component/s: camel-core
    • Labels:
      None

      Description

      Setting 'streaming="true" parallelProcessing="true"' consumes large amounts of heap space for big original messages. E.g. 1024m of heap is not enough to process an 80Mb with 500'000 lines, splitting it line by line.
      The problem seems to be the ArrayList in MulticastProcessor line 224. It contains a Future<Exchange> object for every token delivered by the java.util.Scanner. The list is only cleared (going out of scope) after all Future objects have been completed.

        Activity

        Show
        Claus Ibsen added a comment - See nabble http://camel.465427.n5.nabble.com/2-Bugs-in-Splitter-Camel-2-5-0-tp3326727p3326727.html
        Hide
        Claus Ibsen added a comment -

        Yeah the tasks list is only used for cancelling tasks which isn't needed to be processed anymore due we are done due timeout or stop on exception. So it should be possible to refactor the code to not use a task list for that.

        Show
        Claus Ibsen added a comment - Yeah the tasks list is only used for cancelling tasks which isn't needed to be processed anymore due we are done due timeout or stop on exception. So it should be possible to refactor the code to not use a task list for that.
        Hide
        Claus Ibsen added a comment -

        The CompletionService holds a reference to the Future so there is no gain really.

        Show
        Claus Ibsen added a comment - The CompletionService holds a reference to the Future so there is no gain really.
        Hide
        Claus Ibsen added a comment -

        The issue is the splitter copies the exchange for each splitted message. And the CompletionService keeps reference to all exchanges, which means we end up with a lot of Exchange at once in memory which eats up memory.

        Will have to come up with some way of discarding not needed exchanges during processing. Maybe even using something else than the CompletionService if its the culprint.
        I added this to the known issues to Camel 2.5 release notes.

        Show
        Claus Ibsen added a comment - The issue is the splitter copies the exchange for each splitted message. And the CompletionService keeps reference to all exchanges, which means we end up with a lot of Exchange at once in memory which eats up memory. Will have to come up with some way of discarding not needed exchanges during processing. Maybe even using something else than the CompletionService if its the culprint. I added this to the known issues to Camel 2.5 release notes.
        Hide
        Claus Ibsen added a comment -

        Camel 3.0 will have internal optimization which helps reducing memory footprint used during routing.

        Show
        Claus Ibsen added a comment - Camel 3.0 will have internal optimization which helps reducing memory footprint used during routing.
        Hide
        Claus Ibsen added a comment -

        trunk on Camel 2.6 in rev 1056325:
        Cancelling future tasks is now done using a running boolean instead of a keeping a big array list with the future references.

        Show
        Claus Ibsen added a comment - trunk on Camel 2.6 in rev 1056325: Cancelling future tasks is now done using a running boolean instead of a keeping a big array list with the future references.
        Hide
        Claus Ibsen added a comment -

        trunk on Camel 2.6 in rev 1056380:

        I have reduced memory consumption used, which should allow it to be a bit better. But the splitter still uses a bit additional memory due the splitting is based on a copy of the input message for each splitted message.

        Ralf you are welcome to test again and see if you can process a bit more than previously.

        Show
        Claus Ibsen added a comment - trunk on Camel 2.6 in rev 1056380: I have reduced memory consumption used, which should allow it to be a bit better. But the splitter still uses a bit additional memory due the splitting is based on a copy of the input message for each splitted message. Ralf you are welcome to test again and see if you can process a bit more than previously.
        Hide
        Claus Ibsen added a comment -

        Okay good news. I refactored the logic so Camel now aggregates the parallel tasks on-the-fly.

        This makes a tremendous difference. Now I can split a file into 50.000 sub messages and process that in 7 sec, using at most 18mb.
        Before I would hit an issue at about 25.000-30.000 message and hit OOME with 130mb.

        Since the logic is more complex because there is a separate tasks which aggregates on the fly, while the other task submit new tasks, there is logic to signal between the two tasks. They kinda need to agree when there are no more messages to split, and when it has aggregated all of those.

        Show
        Claus Ibsen added a comment - Okay good news. I refactored the logic so Camel now aggregates the parallel tasks on-the-fly. This makes a tremendous difference. Now I can split a file into 50.000 sub messages and process that in 7 sec, using at most 18mb. Before I would hit an issue at about 25.000-30.000 message and hit OOME with 130mb. Since the logic is more complex because there is a separate tasks which aggregates on the fly, while the other task submit new tasks, there is logic to signal between the two tasks. They kinda need to agree when there are no more messages to split, and when it has aggregated all of those.
        Hide
        Claus Ibsen added a comment -

        I ran a test with 1.000.000 rows in a file

        2011-01-08 18:25:44,216 [read #9 - Split] INFO  split                          - Received: 1000000 messages so far. Last group took: 50 millis which is: 20,000 messages per second. average: 17,775.566
        2011-01-08 18:25:44,217 [main           ] INFO  SplitterParallelBigFileTest    - Took 57.423 seconds
        2011-01-08 18:25:44,218 [://target/split] INFO  route1                         - Done splitting bigfile.txt
        

        And the memory usage was at most 33mb at peak.

        Show
        Claus Ibsen added a comment - I ran a test with 1.000.000 rows in a file 2011-01-08 18:25:44,216 [read #9 - Split] INFO split - Received: 1000000 messages so far. Last group took: 50 millis which is: 20,000 messages per second. average: 17,775.566 2011-01-08 18:25:44,217 [main ] INFO SplitterParallelBigFileTest - Took 57.423 seconds 2011-01-08 18:25:44,218 [: //target/split] INFO route1 - Done splitting bigfile.txt And the memory usage was at most 33mb at peak.
        Hide
        Claus Ibsen added a comment -

        trunk: 1056744.

        Now it should run with low memory consumption and you should be able to process very big files.

        Ralf fell free to test with latest code on your system.

        Show
        Claus Ibsen added a comment - trunk: 1056744. Now it should run with low memory consumption and you should be able to process very big files. Ralf fell free to test with latest code on your system.
        Hide
        Claus Ibsen added a comment -

        Fixed rare potential deadlock issue with aggregate task not being given time to run due thread pool overloaded when running in parallel mode on multicast/splitter.

        trunk: 1057139.

        Show
        Claus Ibsen added a comment - Fixed rare potential deadlock issue with aggregate task not being given time to run due thread pool overloaded when running in parallel mode on multicast/splitter. trunk: 1057139.

          People

          • Assignee:
            Claus Ibsen
            Reporter:
            Ralf Steppacher
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development