Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.0.2
-
None
Description
When testing the ShuffleGrouping in a multithreaded environment, it produces an extremely uneven distribution.
This appears to be a result of the Collection.shuffle call here. https://github.com/apache/storm/blob/1.0.x-branch/storm-core/src/jvm/org/apache/storm/grouping/ShuffleGrouping.java#L58
Because current was set to zero before the shuffle, other threads are able to access the arrayList while it is being shuffled.
Stephen's gist here includes a test that results in a very uneven distribution of taskIds from the ShuffleGrouping: https://gist.github.com/Crim/61537958df65a5e13b3844b2d5e28cde
I would have expected the taskIds from the ShuffleGrouping to be almost uniformly distributed.