[METRON-672] SolrIndexingIntegrationTest fails intermittently - ASF JIRA

Details

Type: Bug
Status: Done
Priority: Major
Resolution: Done
Affects Version/s: 0.3.0
Fix Version/s: 0.3.1
Labels:
None

Description

Adapted from a dev list conversation

Initial Error in Travis

Jon noted this in the Travis builds

`test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest):
Took too long to complete: 150582 > 150000`, more details below:

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
166.167 sec <<< FAILURE!

test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
Time elapsed: 166.071 sec  <<< ERROR!

Additional Notes

Casey was able to reproduce this locally (but not in his IDE). Couple details in the dev list excerpt.

Fixing this should ideally include adding more detailed logging to hopefully avoid these issues in the future. As a note, unfortunately in this case, Casey notes that logging seems to make this issue rarer. Still, logging to be able to understand the flow (and tuning logging level as appropriate) would help resolve issues in the future.

Dev List Excerpt

Per Casey:

Ok, so now I'm concerned that this isn't a fluke. Here's an excerpt from
the failing logs on travis for my PR with substantially longer timeouts (
https://s3.amazonaws.com/archive.travis-ci.org/jobs/194575474/log.txt)
Running org.apache.metron.solr.integration.SolrIndexingIntegrationTest
0 vs 10 vs 0
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
10 vs 10 vs 6
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
317.056 sec <<< FAILURE!
test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
Time elapsed: 316.949 sec  <<< ERROR!
java.lang.RuntimeException: Took too long to complete: 300783 > 300000
        at org.apache.metron.integration.ComponentRunner.process(ComponentRunner.java:131)
        at org.apache.metron.indexing.integration.IndexingIntegrationTest.test(IndexingIntegrationTest.java:173)
I'm getting the impression that this isn't the timeout and we have a
mystery on our hands. Each of those lines "10 vs 10 vs 6" happen 15
seconds apart. That line means that it read 10 entries from kafka, 10
entries from the indexed data and 6 entries from HDFS. It's that 6
entries that is the problem. Also of note, this does not seem to
happen to me locally AND it's not consistent on Travis. Given all
that I'd say that it's a problem with the HDFS Writer not getting
flushed, but I verified that it is indeed flushed per message.

One more thing, just for posterity here, it always freezes at 6 records
written to HDFS. That's the reason I thought it was a flushing issue.

One thing that I would caution though is that this is likely a heisenbug.
The more logging I added earlier made it less likely to occur. It seems
more likely to occur on Travis than locally and I made it happen by
repeatedly running mvn install on Metron-solr (after a mvn install of the
whole project).

SolrIndexingIntegrationTest fails intermittently

Details

Description

Initial Error in Travis

Additional Notes

Dev List Excerpt

Attachments

Issue Links

Activity

People

Dates