Description
If request logging is enabled, ProduceRequest can be accessed
and mutated concurrently from a network thread (which calls
toString) and the request handler thread (which calls
clearPartitionRecords()).
That can lead to a ConcurrentModificationException when iterating
the partitionRecords map.
The underlying thread-safety issue has existed since the server
started using the Java implementation of ProduceRequest in 0.10.0.
However, we were incorrectly not clearing the underlying struct until
0.10.2, so toString itself was thread-safe until that change. In 0.10.2,
toString is no longer thread-safe and we could potentially see a
NullPointerException given the right set of interleavings between
toString and clearPartitionRecords although we haven't seen that
happen yet.
In trunk, we changed the requests to have a toStruct method
instead of creating a struct in the constructor and toString was
no longer printing the contents of the Struct. This accidentally
fixed the race condition, but it meant that request logging was less
useful.
A couple of days ago, AbstractRequest.toString was changed to
print the contents of the request by calling toStruct().toString()
and reintroduced the race condition. The impact is more visible
because we iterate over a HashMap, which proactively
checks for concurrent modification (unlike arrays).
We will need a separate PR for 0.10.2.