Yep, a little bit complicated. That is why it is fully covered by JUnits and comments.
Classes in this File Line Coverage Branch Coverage Complexity
CollectionSearchUtil 91% 51/56 81% 36/44 6,5
There is NO bunch of copying to a temporary collection. There only one array of size sqrt(2*N), allocated once and coping is linear (same as iterating).
Let's analyze resources required for this search on N size collection:
Memory usage: == sqrt(2*N)
Array copping: <= N
Iteration ( next() execution): <= N
Compare (with MD5/Column comparator): <= sqrt(2*N)
I have solution (see patch 1) that perform iterating instead of allocation array, but it will require O(N^2) iterating in worst case.
In second patch memory usage is trade of for only one passage with iterator. Iteration can be slow, so array is much better.
You can check that for million columns search (pretty big row) it will be array with length: ~1440. Not too much for such huge search I think .
In case of 10k columns row it will be only 144 length array, with is pretty few.
About getSortedColumns(byte startWith):
Yes, it is another good solution. binarySearch is little bit faster in case you have indexed access to underling Columns (like List or Array).
But there is still one disadvantage: My patch is solving this problem and changes only few code lines
and this solution requires much more code changes. Lot's of code changes - low release stability. Sorry for sharing pain in JIRA tickets but 1.0.3 seems to be last stable release
As a compromise I can suggest to apply this patch and add ticket for feature to cleanup code, move to binary search and new API in ISortedColumns.