Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2831

DistributedDataGeneratorTest.testGenerateRandomData is flaky



    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.10.0
    • Fix Version/s: 1.10.0
    • Component/s: spark, test
    • Labels:


      Saw this once last month and again today, so not super flaky but still worth fixing:

      1) testGenerateRandomData(org.apache.kudu.spark.tools.DistributedDataGeneratorTest)
      java.lang.AssertionError: expected:<100> but was:<99>
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.failNotEquals(Assert.java:834)
      	at org.junit.Assert.assertEquals(Assert.java:645)
      	at org.junit.Assert.assertEquals(Assert.java:631)
      	at org.apache.kudu.spark.tools.DistributedDataGeneratorTest.testGenerateRandomData(DistributedDataGeneratorTest.scala:58)

      I talked about this with Grant Henke when it last happened. The issue appears to be in the LongAccumulator used to track collisions in the data generator. Before the failure, the test logged this:

      02:22:39.533 [INFO - main] (DistributedDataGenerator.scala:134) Rows written: 99
      02:22:39.533 [INFO - main] (DistributedDataGenerator.scala:135) Collisions: 1

      The assert code looks like this:

          val collisions = ss.sparkContext.longAccumulator("row_collisions").value
          // Collisions could cause the number of row to be less than the number set.
          assertEquals(numRows - collisions, rdd.collect.length)

      So the value of this LongAccumulator was zero even though there was one collision. Our thinking was that accumulators like these were updated asynchronously and so if we don't wait for the entire job to finish, we may not be getting their up-to-date values at assertion time.

      We publish other LongAccumulators in kudu-spark, but AFAICT this is the only one that is asserted on. Nevertheless, it would be great if we could solve this in some generic way so that if someone wrote a test that used a different LongAccumulator, the race could be avoided.




            • Assignee:
              wdberkeley William Berkeley
              adar Adar Dembo
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: