• Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None


      As part of mark miller's push to cleanup tests, one change he made as part of his big_ SOLR-12801 commit (circa Nov2018) was to dissable the randomized use of TLOG replicas in a lot of tests

      His comments at the time were that he suspected a lot of the problems he was seeing was due to a poor implementation of TestInjection.waitForInSyncWithLeader() (which only comes into play for TLOG replicas) ultimately leading to him creating SOLR-12313.

      But based on some limited experimentation I made w/trying to re-enable TLOG replica randomization in some tests after (essentially) removing TestInjection.waitForInSyncWithLeader() in SOLR-13168 i'm still seeing a lot of sporadic test failures when TLOG replicas get used... the only change is that instead of "failing slow" because of the stalls introduced by TestInjection.waitForInSyncWithLeader() they started failing quickly.

      It's not clear if these failures are because the tests have bugs; or if the tests don't account for the expected behavior of the TLOG replica types in certain situations; or if the code paths being tested have bugs when dealing with TLOG replicas.

      Bottom line: As things stand today, TLOG replicas aren't being very thoroughly tested, particularly in edge cases (http partitions, LIR, leader election, mixed used of replica types, etc...)


        Issue Links



              Unassigned Unassigned
              hossman Chris M. Hostetter
              0 Vote for this issue
              1 Start watching this issue