Uploaded image for project: 'Metron (Retired)'
  1. Metron (Retired)
  2. METRON-672

SolrIndexingIntegrationTest fails intermittently

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Done
    • Major
    • Resolution: Done
    • 0.3.0
    • 0.3.1
    • None

    Description

      Adapted from a dev list conversation

      Initial Error in Travis

      Jon noted this in the Travis builds

      `test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest):
      Took too long to complete: 150582 > 150000`, more details below:
      
      Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
      166.167 sec <<< FAILURE!
      
      test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
      Time elapsed: 166.071 sec  <<< ERROR!
      

      Additional Notes

      Casey was able to reproduce this locally (but not in his IDE). Couple details in the dev list excerpt.

      Fixing this should ideally include adding more detailed logging to hopefully avoid these issues in the future. As a note, unfortunately in this case, Casey notes that logging seems to make this issue rarer. Still, logging to be able to understand the flow (and tuning logging level as appropriate) would help resolve issues in the future.

      Dev List Excerpt

      Per Casey:

      Ok, so now I'm concerned that this isn't a fluke. Here's an excerpt from
      the failing logs on travis for my PR with substantially longer timeouts (
      https://s3.amazonaws.com/archive.travis-ci.org/jobs/194575474/log.txt)

      Running org.apache.metron.solr.integration.SolrIndexingIntegrationTest
      0 vs 10 vs 0
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
      10 vs 10 vs 6
      Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
      317.056 sec <<< FAILURE!
      test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
      Time elapsed: 316.949 sec  <<< ERROR!
      java.lang.RuntimeException: Took too long to complete: 300783 > 300000
              at org.apache.metron.integration.ComponentRunner.process(ComponentRunner.java:131)
              at org.apache.metron.indexing.integration.IndexingIntegrationTest.test(IndexingIntegrationTest.java:173)
      

      I'm getting the impression that this isn't the timeout and we have a
      mystery on our hands. Each of those lines "10 vs 10 vs 6" happen 15
      seconds apart. That line means that it read 10 entries from kafka, 10
      entries from the indexed data and 6 entries from HDFS. It's that 6
      entries that is the problem. Also of note, this does not seem to
      happen to me locally AND it's not consistent on Travis. Given all
      that I'd say that it's a problem with the HDFS Writer not getting
      flushed, but I verified that it is indeed flushed per message.

      One more thing, just for posterity here, it always freezes at 6 records
      written to HDFS. That's the reason I thought it was a flushing issue.

      One thing that I would caution though is that this is likely a heisenbug.
      The more logging I added earlier made it less likely to occur. It seems
      more likely to occur on Travis than locally and I made it happen by
      repeatedly running mvn install on Metron-solr (after a mvn install of the
      whole project).

      Attachments

        Issue Links

          Activity

            People

              cestella Casey Stella
              justinleet Justin Leet
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: