Solr
  1. Solr
  2. SOLR-7603

Scary non reproducible failure from UpdateRequestProcessorFactoryTest.testUpdateDistribChainSkipping

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      jenkins nightly hit a very inexplicable error today...

      Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/860/
      At revision 1682097
      Updating http://svn.apache.org/repos/asf/lucene/dev/branches/branch_5x at revision '2015-05-27T14:50:50.016 -0400'
      [java-info] java version "1.7.0_72"
      [java-info] Java(TM) SE Runtime Environment (1.7.0_72-b14, Oracle Corporation)
      [java-info] Java HotSpot(TM) 64-Bit Server VM (24.72-b04, Oracle Corporation)
      
        [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=UpdateRequestProcessorFactoryTest
      -Dtests.method=testUpdateDistribChainSkipping -Dtests.seed=4ECABCCFD159BE21 -Dtests.multiplier=2
      -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/x1/jenkins/lucene-data/enwiki.random.lines.txt
      -Dtests.locale=mt_MT -Dtests.timezone=Etc/GMT0 -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
         [junit4] FAILURE 0.01s J0 | UpdateRequestProcessorFactoryTest.testUpdateDistribChainSkipping <<<
         [junit4]    > Throwable #1: java.lang.AssertionError
         [junit4]    >        at __randomizedtesting.SeedInfo.seed([4ECABCCFD159BE21:3F2E4219A7B299ED]:0)
         [junit4]    >        at
      org.apache.solr.update.processor.UpdateRequestProcessorFactoryTest.testUpdateDistribChainSkipping(UpdateRequestProcessorFactoryTest.java:111)
         [junit4]    >        at java.lang.Thread.run(Thread.java:745)
      

      ...the line in question is asserting that when executing a distributed update (ie: forwarded from another node), the LogUpdateProcessor is still part of the chain because it's got got the "RunAlways" annotation indicating it should always be included in the chain (everything before hte DistribUpdateProcessor is normally)

      There's really no explanation for why the LogUpdateProcessor wouldn't be found other then a code bug – but in that case why doesn't the seed reproduce reliably?

        Activity

        Hide
        Hoss Man added a comment -

        full log from jenkins.

        for now, the only thing i can think of, is to add some more logging & assert details to the test to help us diagnose if it fails again.

        Show
        Hoss Man added a comment - full log from jenkins. for now, the only thing i can think of, is to add some more logging & assert details to the test to help us diagnose if it fails again.
        Hide
        Hoss Man added a comment -

        fixed description - i was missunderstanding why the Log processor was expected by the test.

        Show
        Hoss Man added a comment - fixed description - i was missunderstanding why the Log processor was expected by the test.
        Hide
        ASF subversion and git services added a comment -

        Commit 1682137 from hossman@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1682137 ]

        SOLR-7603: more test verbosity to try and track this down in the future

        Show
        ASF subversion and git services added a comment - Commit 1682137 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1682137 ] SOLR-7603 : more test verbosity to try and track this down in the future
        Hide
        ASF subversion and git services added a comment -

        Commit 1682138 from hossman@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1682138 ]

        SOLR-7603: more test verbosity to try and track this down in the future (merge r1682137)

        Show
        ASF subversion and git services added a comment - Commit 1682138 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1682138 ] SOLR-7603 : more test verbosity to try and track this down in the future (merge r1682137)
        Hide
        Hoss Man added a comment - - edited

        This happened again last night with the new test assertions providing a bit more detail...

        Date: Fri, 29 May 2015 09:42:29 +0000 (UTC)
        Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/696/
        Updating http://svn.apache.org/repos/asf/lucene/dev/trunk at revision '2015-05-29T03:12:26.055 -0400'
        At revision 1682377
        
        -print-java-info:
        [java-info] java version "1.8.0_25"
        [java-info] Java(TM) SE Runtime Environment (1.8.0_25-b17, Oracle Corporation)
        [java-info] Java HotSpot(TM) 64-Bit Server VM (25.25-b02, Oracle Corporation)
        [java-info] Test args: []
        
        java.lang.AssertionError: distrib-chain-explicit expected LogUpdateProcessor in chain due to @RunAllways, but not
        found: org.apache.solr.update.processor.DistributedUpdateProcessor@638c6e19,
        org.apache.solr.update.processor.RemoveBlankFieldUpdateProcessorFactory$1@25bbdfcb,
        org.apache.solr.update.processor.RunUpdateProcessor@473fea05, 
                at __randomizedtesting.SeedInfo.seed([3AF6852C6379681:724B9684B0DCB14D]:0)
                at org.junit.Assert.fail(Assert.java:93)
                at org.junit.Assert.assertTrue(Assert.java:43)
                at
        org.apache.solr.update.processor.UpdateRequestProcessorFactoryTest.testUpdateDistribChainSkipping(UpdateRequestProcessorFactoryTest.java:127)
        
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=UpdateRequestProcessorFactoryTest -Dtests.method=testUpdateDistribChainSkipping -Dtests.seed=3AF6852C6379681 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/x1/jenkins/lucene-data/enwiki.random.lines.txt -Dtests.locale=ru -Dtests.timezone=Asia/Baghdad -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
           [junit4] FAILURE 0.03s J2 | UpdateRequestProcessorFactoryTest.testUpdateDistribChainSkipping <<<
           [junit4]    > Throwable #1: java.lang.AssertionError: distrib-chain-explicit expected LogUpdateProcessor in chain due to @RunAllways, but not found: org.apache.solr.update.processor.DistributedUpdateProcessor@638c6e19, org.apache.solr.update.processor.RemoveBlankFieldUpdateProcessorFactory$1@25bbdfcb, org.apache.solr.update.processor.RunUpdateProcessor@473fea05, 
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([3AF6852C6379681:724B9684B0DCB14D]:0)
           [junit4]    > 	at org.apache.solr.update.processor.UpdateRequestProcessorFactoryTest.testUpdateDistribChainSkipping(UpdateRequestProcessorFactoryTest.java:127)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
        

        (EDIT: updated with reproduce line now that builds.apache.org is back up)

        I realy can't make heads or tails of why LogUpdateProcessor wouldn't be in that chain.


        FWIW: first incidence i can find of a failure from this assert is...

        Date: Sat, 2 May 2015 17:14:16 +0000 (UTC)
        Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/835/
        
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=UpdateRequestProcessorFactoryTest -Dtests.method=testUpdateDistribChainSkipping -Dtests.seed=D82449739AD0D42D -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt -Dtests.locale=nl -Dtests.timezone=America/Denver -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
           [junit4] FAILURE 0.01s J0 | UpdateRequestProcessorFactoryTest.testUpdateDistribChainSkipping <<<
           [junit4]    > Throwable #1: java.lang.AssertionError
           [junit4]    >        at __randomizedtesting.SeedInfo.seed([D82449739AD0D42D:A9C0B7A5EC3BF3E1]:0)
           [junit4]    >        at org.apache.solr.update.processor.UpdateRequestProcessorFactoryTest.testUpdateDistribChainSkipping(UpdateRequestProcessorFactoryTest.java:111)
           [junit4]    >        at java.lang.Thread.run(Thread.java:745)
        
        Show
        Hoss Man added a comment - - edited This happened again last night with the new test assertions providing a bit more detail... Date: Fri, 29 May 2015 09:42:29 +0000 (UTC) Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/696/ Updating http://svn.apache.org/repos/asf/lucene/dev/trunk at revision '2015-05-29T03:12:26.055 -0400' At revision 1682377 -print-java-info: [java-info] java version "1.8.0_25" [java-info] Java(TM) SE Runtime Environment (1.8.0_25-b17, Oracle Corporation) [java-info] Java HotSpot(TM) 64-Bit Server VM (25.25-b02, Oracle Corporation) [java-info] Test args: [] java.lang.AssertionError: distrib-chain-explicit expected LogUpdateProcessor in chain due to @RunAllways, but not found: org.apache.solr.update.processor.DistributedUpdateProcessor@638c6e19, org.apache.solr.update.processor.RemoveBlankFieldUpdateProcessorFactory$1@25bbdfcb, org.apache.solr.update.processor.RunUpdateProcessor@473fea05, at __randomizedtesting.SeedInfo.seed([3AF6852C6379681:724B9684B0DCB14D]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.solr.update.processor.UpdateRequestProcessorFactoryTest.testUpdateDistribChainSkipping(UpdateRequestProcessorFactoryTest.java:127) [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=UpdateRequestProcessorFactoryTest -Dtests.method=testUpdateDistribChainSkipping -Dtests.seed=3AF6852C6379681 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/x1/jenkins/lucene-data/enwiki.random.lines.txt -Dtests.locale=ru -Dtests.timezone=Asia/Baghdad -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] FAILURE 0.03s J2 | UpdateRequestProcessorFactoryTest.testUpdateDistribChainSkipping <<< [junit4] > Throwable #1: java.lang.AssertionError: distrib-chain-explicit expected LogUpdateProcessor in chain due to @RunAllways, but not found: org.apache.solr.update.processor.DistributedUpdateProcessor@638c6e19, org.apache.solr.update.processor.RemoveBlankFieldUpdateProcessorFactory$1@25bbdfcb, org.apache.solr.update.processor.RunUpdateProcessor@473fea05, [junit4] > at __randomizedtesting.SeedInfo.seed([3AF6852C6379681:724B9684B0DCB14D]:0) [junit4] > at org.apache.solr.update.processor.UpdateRequestProcessorFactoryTest.testUpdateDistribChainSkipping(UpdateRequestProcessorFactoryTest.java:127) [junit4] > at java.lang.Thread.run(Thread.java:745) ( EDIT : updated with reproduce line now that builds.apache.org is back up) I realy can't make heads or tails of why LogUpdateProcessor wouldn't be in that chain. FWIW: first incidence i can find of a failure from this assert is... Date: Sat, 2 May 2015 17:14:16 +0000 (UTC) Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/835/ [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=UpdateRequestProcessorFactoryTest -Dtests.method=testUpdateDistribChainSkipping -Dtests.seed=D82449739AD0D42D -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt -Dtests.locale=nl -Dtests.timezone=America/Denver -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] FAILURE 0.01s J0 | UpdateRequestProcessorFactoryTest.testUpdateDistribChainSkipping <<< [junit4] > Throwable #1: java.lang.AssertionError [junit4] > at __randomizedtesting.SeedInfo.seed([D82449739AD0D42D:A9C0B7A5EC3BF3E1]:0) [junit4] > at org.apache.solr.update.processor.UpdateRequestProcessorFactoryTest.testUpdateDistribChainSkipping(UpdateRequestProcessorFactoryTest.java:111) [junit4] > at java.lang.Thread.run(Thread.java:745)
        Hide
        ASF subversion and git services added a comment -

        Commit 1682564 from hossman@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1682564 ]

        SOLR-7603: more detail in asserts, and more asserts on the initial chain (before looking at the distributed version) to try and figure out WTF is going on here

        Show
        ASF subversion and git services added a comment - Commit 1682564 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1682564 ] SOLR-7603 : more detail in asserts, and more asserts on the initial chain (before looking at the distributed version) to try and figure out WTF is going on here
        Hide
        ASF subversion and git services added a comment -

        Commit 1682565 from hossman@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1682565 ]

        SOLR-7603: more detail in asserts, and more asserts on the initial chain (before looking at the distributed version) to try and figure out WTF is going on here (merge r1682564)

        Show
        ASF subversion and git services added a comment - Commit 1682565 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1682565 ] SOLR-7603 : more detail in asserts, and more asserts on the initial chain (before looking at the distributed version) to try and figure out WTF is going on here (merge r1682564)
        Hide
        Hoss Man added a comment -

        TL;DR: still no clue, but added more logging/assert details to test

        i've been beating my head against this and still can't make heads or tells of this failure – the best guess i've got is that the logic for "pruning" the distributed chain down (but including any "RunAllways" processors) is actually working fine, but perhaps there is some failure in the initial construction of the chain in the first place? (SOLR-6892 recently modified the way the chains are initialized)

        So i've added a hack to increase the log level for the duration of the test, as well as some more asserts regarding the state of the chain, and simplified the logic around how we assert properties of the distributed chain so it's a bit more straight forward an we can include a list of every proc in every assert.

        Show
        Hoss Man added a comment - TL;DR: still no clue, but added more logging/assert details to test i've been beating my head against this and still can't make heads or tells of this failure – the best guess i've got is that the logic for "pruning" the distributed chain down (but including any "RunAllways" processors) is actually working fine, but perhaps there is some failure in the initial construction of the chain in the first place? ( SOLR-6892 recently modified the way the chains are initialized) So i've added a hack to increase the log level for the duration of the test, as well as some more asserts regarding the state of the chain, and simplified the logic around how we assert properties of the distributed chain so it's a bit more straight forward an we can include a list of every proc in every assert.
        Hide
        Hoss Man added a comment -

        TL;DR: fairly certain this is a (nightly only) test bug caused by SOLR-7408, working on fix

        On IRC Tim drew my attention to some behavior of the LogUpdateProcessorFactory that i had completely forgotten about...

          @Override
          public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) {
            return LogUpdateProcessor.log.isInfoEnabled() ? new LogUpdateProcessor(req, rsp, this, next) : null;
          }
        

        ...in a nut shell: as an optimization, the factory doesn't produce a processor if it can tell from the current logging level that there is no point in using that processor.

        Tim's theory was that some of the recent MDC/logging related changes may be affecting hte log level used i nthe nightly tests – but i'm fairly certain the root cause is much more of a fluke...

        SOLR-7408 introduced ConcurrentDeleteAndCreateCollectionTest in r1675274 on "Apr 22 08:25:26 2015 UTC" .. this is an @Nightly test that has this bit of code in it...

            Logger.getLogger("org.apache.solr").setLevel(Level.WARN);
        

        Which means if this test runs before UpdateRequestProcessorFactoryTest in the same JVM, the log level won't be low enough for the LogUpdateProcessor to ever be created.

        I've confirmed that happeend in both of the very recent failures...

        https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/696/consoleText

           [junit4]   2> NOTE: All tests run in this JVM: [NotRequiredUniqueKeyTest, ShardSplitTest, DeleteLastCustomShardedReplicaTest, RequestLoggingTest, TestReplicaProperties, TestCSVResponseWriter, TestShardHandlerFactory, QueryEqualityTest, TestAnalyzeInfixSuggestions, TestSystemIdResolver, InfoHandlerTest, TestLRUStatsCache, StatsComponentTest, CoreAdminRequestStatusTest, SoftAutoCommitTest, FacetPivotSmallTest, TestTrie, SimpleFacetsTest, TestExtendedDismaxParser, TestQueryUtils, TestDocSet, TestPerFieldSimilarity, SolrCoreCheckLockOnStartupTest, SearchHandlerTest, TestSchemaManager, DirectUpdateHandlerTest, HdfsChaosMonkeySafeLeaderTest, SpellCheckCollatorTest, SolrInfoMBeanTest, TestConfigOverlay, DistributedQueueTest, TestXIncludeConfig, TestSolrJ, OutputWriterTest, HdfsLockFactoryTest, TestLeaderElectionZkExpiry, TestBM25SimilarityFactory, AddSchemaFieldsUpdateProcessorFactoryTest, PathHierarchyTokenizerFactoryTest, DistributedFacetPivotSmallAdvancedTest, BinaryUpdateRequestHandlerTest, SOLR749Test, TestComplexPhraseQParserPlugin, TestRandomMergePolicy, CachingDirectoryFactoryTest, LeaderElectionTest, HdfsNNFailoverTest, TestManagedSchemaDynamicFieldResource, OverseerCollectionProcessorTest, TestQuerySenderNoQuery, SortByFunctionTest, TestNRTOpen, AddBlockUpdateTest, TestBinaryResponseWriter, AutoCommitTest, CloudExitableDirectoryReaderTest, TestExactSharedStatsCache, HighlighterMaxOffsetTest, TestDefaultSimilarityFactory, TestSchemaNameResource, TestAuthorizationFramework, TestSchemaSimilarityResource, TestCollationFieldDocValues, TestZkChroot, ConcurrentDeleteAndCreateCollectionTest, BufferStoreTest, TestRestoreCore, QueryResultKeyTest, TermVectorComponentDistributedTest, TestFieldTypeCollectionResource, TestSolrConfigHandlerConcurrent, MoreLikeThisHandlerTest, TestChildDocTransformer, CursorMarkTest, TestSimpleQParserPlugin, XsltUpdateRequestHandlerTest, TestSurroundQueryParser, OverseerTest, FullSolrCloudDistribCmdsTest, ZkSolrClientTest, TestRandomDVFaceting, ZkCLITest, TestDistributedSearch, TestDistributedGrouping, TestRecovery, TestRealTimeGet, TestStressReorder, TestJoin, TestReload, HardAutoCommitTest, TestRangeQuery, TestGroupingSearch, SolrCmdDistributorTest, PeerSyncTest, BadIndexSchemaTest, TestSort, TestFiltering, BasicFunctionalityTest, TestIndexSearcher, ShowFileRequestHandlerTest, CurrencyFieldOpenExchangeTest, DistributedQueryElevationComponentTest, SolrIndexSplitterTest, AnalysisAfterCoreReloadTest, SignatureUpdateProcessorFactoryTest, SuggesterFSTTest, CoreAdminHandlerTest, SolrRequestParserTest, TestFoldingMultitermQuery, DocValuesTest, SuggesterTest, SpatialFilterTest, PolyFieldTest, NoCacheHeaderTest, WordBreakSolrSpellCheckerTest, SchemaVersionSpecificBehaviorTest, TestPseudoReturnFields, FieldMutatingUpdateProcessorTest, TestAtomicUpdateErrorCases, DirectUpdateHandlerOptimizeTest, TestRemoteStreaming, TestSolrDeletionPolicy1, StandardRequestHandlerTest, TestWriterPerf, DirectSolrSpellCheckerTest, TestReversedWildcardFilterFactory, DocumentAnalysisRequestHandlerTest, TestQueryTypes, PrimitiveFieldTypeTest, TestOmitPositions, FileBasedSpellCheckerTest, XmlUpdateRequestHandlerTest, DocumentBuilderTest, TestValueSourceCache, TestIndexingPerformance, RequiredFieldsTest, FieldAnalysisRequestHandlerTest, TestSolrQueryParser, LoggingHandlerTest, RegexBoostProcessorTest, SolrPluginUtilsTest, TestCollationField, ReturnFieldsTest, UpdateRequestProcessorFactoryTest]
        

        https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/860/consoleText

           [junit4]   2> NOTE: All tests run in this JVM: [TermVectorComponentTest, MBeansHandlerTest, TestSolrDeletionPolicy1, TestDocumentBuilder, HighlighterConfigTest, DistributedSpellCheckComponentTest, TestSchemaManager, EnumFieldTest, TestNRTOpen, TestStressLucene, TestJoin, AsyncMigrateRouteKeyTest, URLClassifyProcessorTest, TestBinaryField, SignatureUpdateProcessorFactoryTest, TestJmxMonitoredMap, TestIBSimilarityFactory, TestRandomFaceting, DistributedFacetPivotSmallTest, TestEmbeddedSolrServerConstructors, AnalyticsMergeStrategyTest, TestDFRSimilarityFactory, SuggesterWFSTTest, HdfsRecoveryZkTest, TestReRankQParserPlugin, DistributedExpandComponentTest, IndexBasedSpellCheckerTest, TestWordDelimiterFilterFactory, BufferStoreTest, QueryElevationComponentTest, TestJettySolrRunner, TestSweetSpotSimilarityFactory, DistribCursorPagingTest, TestArbitraryIndexDir, DistributedFacetPivotLargeTest, TestConfig, TestBadConfig, ConcurrentDeleteAndCreateCollectionTest, SolrCloudExampleTest, TestUniqueKeyFieldResource, TestExceedMaxTermLength, BlockCacheTest, SolrIndexConfigTest, CursorMarkTest, TestPhraseSuggestions, TestImplicitCoreProperties, TestSolr4Spatial, TestRawResponseWriter, TestFaceting, DateFieldTest, TestCryptoKeys, RuleEngineTest, OverseerStatusTest, PrimitiveFieldTypeTest, ExternalFileFieldSortTest, MultiThreadedOCPTest, RequestHandlersTest, TestRangeQuery, RulesTest, ShardSplitTest, TestCodecSupport, RecoveryAfterSoftCommitTest, CoreMergeIndexesAdminHandlerTest, LeaderElectionIntegrationTest, SchemaVersionSpecificBehaviorTest, MinimalSchemaTest, ParsingFieldUpdateProcessorsTest, ClusterStateTest, DateMathParserTest, DebugComponentTest, TestLazyCores, TestManagedSchemaDynamicFieldResource, ChaosMonkeyNothingIsSafeTest, SyncSliceTest, ZkNodePropsTest, TestMergePolicyConfig, TestCSVLoader, SolrRequestParserTest, BasicZkTest, TestFunctionQuery, ActionThrottleTest, BasicDistributedZkTest, ChaosMonkeySafeLeaderTest, OverseerTest, ClusterStateUpdateTest, TestZkChroot, TestRandomDVFaceting, ZkCLITest, TestDistributedSearch, ZkControllerTest, TestRealTimeGet, DistributedTermsComponentTest, TestCoreContainer, SolrCoreTest, PeerSyncTest, BadIndexSchemaTest, TestSort, BasicFunctionalityTest, DirectUpdateHandlerTest, HighlighterTest, SoftAutoCommitTest, ShowFileRequestHandlerTest, CurrencyFieldXmlFileTest, SimplePostToolTest, TestExtendedDismaxParser, SuggesterFSTTest, DocValuesTest, SuggesterTSTTest, PolyFieldTest, TestUpdate, TestAtomicUpdateErrorCases, DirectUpdateHandlerOptimizeTest, DefaultValueUpdateProcessorTest, SortByFunctionTest, TestRemoteStreaming, DocValuesMultiTest, DistanceFunctionTest, SolrInfoMBeanTest, XsltUpdateRequestHandlerTest, CacheHeaderTest, TestQueryUtils, StandardRequestHandlerTest, TestWriterPerf, TestReversedWildcardFilterFactory, DocumentAnalysisRequestHandlerTest, TestQueryTypes, TestOmitPositions, FileBasedSpellCheckerTest, TermsComponentTest, DocumentBuilderTest, TestValueSourceCache, PathHierarchyTokenizerFactoryTest, TestIndexingPerformance, RequiredFieldsTest, FieldAnalysisRequestHandlerTest, TestSolrQueryParser, FastVectorHighlighterTest, IndexSchemaRuntimeFieldTest, LoggingHandlerTest, SolrPluginUtilsTest, IndexSchemaTest, TestJmxIntegration, TestCollationField, UpdateRequestProcessorFactoryTest]
        

        ...ironically, one of the changes i recently made was to force the log level to DEBUG for the duration of the UpdateRequestProcessorFactoryTest so we'd get full logs in the even of failure – so i'm pretty sure that will prevent the failure from ever happening again. but what i plan to do as a more long term fix:

        • fix ConcurrentDeleteAndCreateCollectionTest to reset the log level it munges
        • make UpdateRequestProcessorFactoryTest assert the neccessary log level needed for LogUpdateProcessor to future proof us on similar bugs
        • (eventually) remove the forced DEBUG that i just added to the test (once some time has gone by w/o any other failures just in case i'm wrong)
        Show
        Hoss Man added a comment - TL;DR: fairly certain this is a (nightly only) test bug caused by SOLR-7408 , working on fix On IRC Tim drew my attention to some behavior of the LogUpdateProcessorFactory that i had completely forgotten about... @Override public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { return LogUpdateProcessor.log.isInfoEnabled() ? new LogUpdateProcessor(req, rsp, this , next) : null ; } ...in a nut shell: as an optimization, the factory doesn't produce a processor if it can tell from the current logging level that there is no point in using that processor. Tim's theory was that some of the recent MDC/logging related changes may be affecting hte log level used i nthe nightly tests – but i'm fairly certain the root cause is much more of a fluke... SOLR-7408 introduced ConcurrentDeleteAndCreateCollectionTest in r1675274 on "Apr 22 08:25:26 2015 UTC" .. this is an @Nightly test that has this bit of code in it... Logger.getLogger( "org.apache.solr" ).setLevel(Level.WARN); Which means if this test runs before UpdateRequestProcessorFactoryTest in the same JVM, the log level won't be low enough for the LogUpdateProcessor to ever be created. I've confirmed that happeend in both of the very recent failures... https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/696/consoleText [junit4] 2> NOTE: All tests run in this JVM: [NotRequiredUniqueKeyTest, ShardSplitTest, DeleteLastCustomShardedReplicaTest, RequestLoggingTest, TestReplicaProperties, TestCSVResponseWriter, TestShardHandlerFactory, QueryEqualityTest, TestAnalyzeInfixSuggestions, TestSystemIdResolver, InfoHandlerTest, TestLRUStatsCache, StatsComponentTest, CoreAdminRequestStatusTest, SoftAutoCommitTest, FacetPivotSmallTest, TestTrie, SimpleFacetsTest, TestExtendedDismaxParser, TestQueryUtils, TestDocSet, TestPerFieldSimilarity, SolrCoreCheckLockOnStartupTest, SearchHandlerTest, TestSchemaManager, DirectUpdateHandlerTest, HdfsChaosMonkeySafeLeaderTest, SpellCheckCollatorTest, SolrInfoMBeanTest, TestConfigOverlay, DistributedQueueTest, TestXIncludeConfig, TestSolrJ, OutputWriterTest, HdfsLockFactoryTest, TestLeaderElectionZkExpiry, TestBM25SimilarityFactory, AddSchemaFieldsUpdateProcessorFactoryTest, PathHierarchyTokenizerFactoryTest, DistributedFacetPivotSmallAdvancedTest, BinaryUpdateRequestHandlerTest, SOLR749Test, TestComplexPhraseQParserPlugin, TestRandomMergePolicy, CachingDirectoryFactoryTest, LeaderElectionTest, HdfsNNFailoverTest, TestManagedSchemaDynamicFieldResource, OverseerCollectionProcessorTest, TestQuerySenderNoQuery, SortByFunctionTest, TestNRTOpen, AddBlockUpdateTest, TestBinaryResponseWriter, AutoCommitTest, CloudExitableDirectoryReaderTest, TestExactSharedStatsCache, HighlighterMaxOffsetTest, TestDefaultSimilarityFactory, TestSchemaNameResource, TestAuthorizationFramework, TestSchemaSimilarityResource, TestCollationFieldDocValues, TestZkChroot, ConcurrentDeleteAndCreateCollectionTest, BufferStoreTest, TestRestoreCore, QueryResultKeyTest, TermVectorComponentDistributedTest, TestFieldTypeCollectionResource, TestSolrConfigHandlerConcurrent, MoreLikeThisHandlerTest, TestChildDocTransformer, CursorMarkTest, TestSimpleQParserPlugin, XsltUpdateRequestHandlerTest, TestSurroundQueryParser, OverseerTest, FullSolrCloudDistribCmdsTest, ZkSolrClientTest, TestRandomDVFaceting, ZkCLITest, TestDistributedSearch, TestDistributedGrouping, TestRecovery, TestRealTimeGet, TestStressReorder, TestJoin, TestReload, HardAutoCommitTest, TestRangeQuery, TestGroupingSearch, SolrCmdDistributorTest, PeerSyncTest, BadIndexSchemaTest, TestSort, TestFiltering, BasicFunctionalityTest, TestIndexSearcher, ShowFileRequestHandlerTest, CurrencyFieldOpenExchangeTest, DistributedQueryElevationComponentTest, SolrIndexSplitterTest, AnalysisAfterCoreReloadTest, SignatureUpdateProcessorFactoryTest, SuggesterFSTTest, CoreAdminHandlerTest, SolrRequestParserTest, TestFoldingMultitermQuery, DocValuesTest, SuggesterTest, SpatialFilterTest, PolyFieldTest, NoCacheHeaderTest, WordBreakSolrSpellCheckerTest, SchemaVersionSpecificBehaviorTest, TestPseudoReturnFields, FieldMutatingUpdateProcessorTest, TestAtomicUpdateErrorCases, DirectUpdateHandlerOptimizeTest, TestRemoteStreaming, TestSolrDeletionPolicy1, StandardRequestHandlerTest, TestWriterPerf, DirectSolrSpellCheckerTest, TestReversedWildcardFilterFactory, DocumentAnalysisRequestHandlerTest, TestQueryTypes, PrimitiveFieldTypeTest, TestOmitPositions, FileBasedSpellCheckerTest, XmlUpdateRequestHandlerTest, DocumentBuilderTest, TestValueSourceCache, TestIndexingPerformance, RequiredFieldsTest, FieldAnalysisRequestHandlerTest, TestSolrQueryParser, LoggingHandlerTest, RegexBoostProcessorTest, SolrPluginUtilsTest, TestCollationField, ReturnFieldsTest, UpdateRequestProcessorFactoryTest] https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/860/consoleText [junit4] 2> NOTE: All tests run in this JVM: [TermVectorComponentTest, MBeansHandlerTest, TestSolrDeletionPolicy1, TestDocumentBuilder, HighlighterConfigTest, DistributedSpellCheckComponentTest, TestSchemaManager, EnumFieldTest, TestNRTOpen, TestStressLucene, TestJoin, AsyncMigrateRouteKeyTest, URLClassifyProcessorTest, TestBinaryField, SignatureUpdateProcessorFactoryTest, TestJmxMonitoredMap, TestIBSimilarityFactory, TestRandomFaceting, DistributedFacetPivotSmallTest, TestEmbeddedSolrServerConstructors, AnalyticsMergeStrategyTest, TestDFRSimilarityFactory, SuggesterWFSTTest, HdfsRecoveryZkTest, TestReRankQParserPlugin, DistributedExpandComponentTest, IndexBasedSpellCheckerTest, TestWordDelimiterFilterFactory, BufferStoreTest, QueryElevationComponentTest, TestJettySolrRunner, TestSweetSpotSimilarityFactory, DistribCursorPagingTest, TestArbitraryIndexDir, DistributedFacetPivotLargeTest, TestConfig, TestBadConfig, ConcurrentDeleteAndCreateCollectionTest, SolrCloudExampleTest, TestUniqueKeyFieldResource, TestExceedMaxTermLength, BlockCacheTest, SolrIndexConfigTest, CursorMarkTest, TestPhraseSuggestions, TestImplicitCoreProperties, TestSolr4Spatial, TestRawResponseWriter, TestFaceting, DateFieldTest, TestCryptoKeys, RuleEngineTest, OverseerStatusTest, PrimitiveFieldTypeTest, ExternalFileFieldSortTest, MultiThreadedOCPTest, RequestHandlersTest, TestRangeQuery, RulesTest, ShardSplitTest, TestCodecSupport, RecoveryAfterSoftCommitTest, CoreMergeIndexesAdminHandlerTest, LeaderElectionIntegrationTest, SchemaVersionSpecificBehaviorTest, MinimalSchemaTest, ParsingFieldUpdateProcessorsTest, ClusterStateTest, DateMathParserTest, DebugComponentTest, TestLazyCores, TestManagedSchemaDynamicFieldResource, ChaosMonkeyNothingIsSafeTest, SyncSliceTest, ZkNodePropsTest, TestMergePolicyConfig, TestCSVLoader, SolrRequestParserTest, BasicZkTest, TestFunctionQuery, ActionThrottleTest, BasicDistributedZkTest, ChaosMonkeySafeLeaderTest, OverseerTest, ClusterStateUpdateTest, TestZkChroot, TestRandomDVFaceting, ZkCLITest, TestDistributedSearch, ZkControllerTest, TestRealTimeGet, DistributedTermsComponentTest, TestCoreContainer, SolrCoreTest, PeerSyncTest, BadIndexSchemaTest, TestSort, BasicFunctionalityTest, DirectUpdateHandlerTest, HighlighterTest, SoftAutoCommitTest, ShowFileRequestHandlerTest, CurrencyFieldXmlFileTest, SimplePostToolTest, TestExtendedDismaxParser, SuggesterFSTTest, DocValuesTest, SuggesterTSTTest, PolyFieldTest, TestUpdate, TestAtomicUpdateErrorCases, DirectUpdateHandlerOptimizeTest, DefaultValueUpdateProcessorTest, SortByFunctionTest, TestRemoteStreaming, DocValuesMultiTest, DistanceFunctionTest, SolrInfoMBeanTest, XsltUpdateRequestHandlerTest, CacheHeaderTest, TestQueryUtils, StandardRequestHandlerTest, TestWriterPerf, TestReversedWildcardFilterFactory, DocumentAnalysisRequestHandlerTest, TestQueryTypes, TestOmitPositions, FileBasedSpellCheckerTest, TermsComponentTest, DocumentBuilderTest, TestValueSourceCache, PathHierarchyTokenizerFactoryTest, TestIndexingPerformance, RequiredFieldsTest, FieldAnalysisRequestHandlerTest, TestSolrQueryParser, FastVectorHighlighterTest, IndexSchemaRuntimeFieldTest, LoggingHandlerTest, SolrPluginUtilsTest, IndexSchemaTest, TestJmxIntegration, TestCollationField, UpdateRequestProcessorFactoryTest] ...ironically, one of the changes i recently made was to force the log level to DEBUG for the duration of the UpdateRequestProcessorFactoryTest so we'd get full logs in the even of failure – so i'm pretty sure that will prevent the failure from ever happening again. but what i plan to do as a more long term fix: fix ConcurrentDeleteAndCreateCollectionTest to reset the log level it munges make UpdateRequestProcessorFactoryTest assert the neccessary log level needed for LogUpdateProcessor to future proof us on similar bugs (eventually) remove the forced DEBUG that i just added to the test (once some time has gone by w/o any other failures just in case i'm wrong)
        Hide
        ASF subversion and git services added a comment -

        Commit 1682570 from hossman@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1682570 ]

        SOLR-7603: more test tweaks to protect ourselves from unexpected log levels in tests like the one introduced by SOLR-7408

        Show
        ASF subversion and git services added a comment - Commit 1682570 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1682570 ] SOLR-7603 : more test tweaks to protect ourselves from unexpected log levels in tests like the one introduced by SOLR-7408
        Hide
        ASF subversion and git services added a comment -

        Commit 1682571 from hossman@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1682571 ]

        SOLR-7603: more test tweaks to protect ourselves from unexpected log levels in tests like the one introduced by SOLR-7408 (merge r1682570)

        Show
        ASF subversion and git services added a comment - Commit 1682571 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1682571 ] SOLR-7603 : more test tweaks to protect ourselves from unexpected log levels in tests like the one introduced by SOLR-7408 (merge r1682570)
        Hide
        ASF subversion and git services added a comment -

        Commit 1684714 from hossman@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1684714 ]

        SOLR-7603: remove extra logging added to diagnose problem - failure hasn't reappeared since fix applied in r1682570

        Show
        ASF subversion and git services added a comment - Commit 1684714 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1684714 ] SOLR-7603 : remove extra logging added to diagnose problem - failure hasn't reappeared since fix applied in r1682570
        Hide
        ASF subversion and git services added a comment -

        Commit 1684718 from hossman@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1684718 ]

        SOLR-7603: remove extra logging added to diagnose problem - failure hasn't reappeared since fix applied in r1682570 (merge r1684714)

        Show
        ASF subversion and git services added a comment - Commit 1684718 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1684718 ] SOLR-7603 : remove extra logging added to diagnose problem - failure hasn't reappeared since fix applied in r1682570 (merge r1684714)
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Hoss Man
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development