Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.8.0
-
None
Description
Steps
- Create the simple flow:
- GenerateFlowFile (with constant payload "txt1,txt2" and 10 secs schedulling)
- -> SplitContent (with comma as a separator)
- -> some chain of processors which get "txt1" and "txt2" as a inbound params and produce flowfiles with more than 1 record ( that's important). For example, I use ExtractText (to get "txt1" and "txt2" as an attribute), then ExecuteSQLRecord (to execute SQL using "txt1" and "txt2" as a parameter)
- -> MergeRecord (with Defragment merge strategy - that's important)
- -> LogAttribute or whatever you prefer to observe the merge result
- Now just run the flow
Result: we'll see an error in logs like
Could not merge bin with 1 FlowFiles because of the 'fragment.count' attribute had a value of '2' but only 1 of 2 FlowFiles were encountered before this bin was evicted (due to to Max Bin Age being reached or due to the Maximum Number of Bins being exceeded).
Expected result: the flow file containing records from both SQL queries (for "txt1" and "txt2")
The cause is RecordBinManager uses fragment.count flow file attribute to calculate required record number to release the bin. However, the attribute contains the number of flow files instead. As in above scenario each file contains more than 1 records (at least 2) that means RecordBin thinks the bin is "full enough" when first flow file arrives (because it contains >= 2 records and fragment.count is equal to 2 in the scenario). So the bin is released wrongly.
I think there is a mistake and in Defragment mode we are interested in a number of flow files and never in records number. In opposite, we should care about a number of records usin Bin-Packaging Algorithm.
Attachments
Attachments
Issue Links
- links to