vinothchandar commented on pull request #8076: KAFKA-9512: Flaky Test LagFetchIntegrationTest.shouldFetchLagsDuringRestoration
URL: https://github.com/apache/kafka/pull/8076
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
- Added additional synchronization and increased timeouts to handle flakiness
- Added some pre-cautionary retries when trying to obtain lag map
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Ran locally the entire suite 100+ times (2x as much as it took to reproduce the original issue)
![Screen Shot 2020-02-09 at 5 57 31 PM](https://user-images.githubusercontent.com/1179324/74115829-c4937680-4b65-11ea-9a97-6c657e40f4a9.png)
P.S: Also ran the two tests I added in `QueryableStateIntegrationTest`. They seem solid (standby test ran 1900 times without flaking out)
-
-
- Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
NPE can happen only if an empty lag map is added to `onRestoreEnd` i.e when restoration finishes there is no entry for changelog topic? This is not possible esp for standby, since there should be a standby task . The test clearly waits till we reach zero lag, using the same stateStoreName.. and that seems to be working..
So I wonder if there is some race between restoration ending and the standy task creation? In any case, the problematic line seems redundant anyway, since it just checks for the same thing as the waitForCondition()