Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7808

MorphlineGoLiveMiniMRTest fails using YARN

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: contrib - MapReduce
    • Labels:
      None

      Description

         [junit4]    > Throwable #1: java.io.IOException: Failed on local exception: java.io.IOException: Connection reset by peer; Host Details : local host is: "drob-rhel/172.25.10.46"; destination host is: "drob-rhel":52807; 
         [junit4]    > 	at __randomizedtesting.SeedInfo.seed([4D93EC191980246A:C5C7D3C3B77C4992]:0)
         [junit4]    > 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
         [junit4]    > 	at org.apache.hadoop.ipc.Client.call(Client.java:1472)
         [junit4]    > 	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
         [junit4]    > 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
         [junit4]    > 	at com.sun.proxy.$Proxy111.getClusterMetrics(Unknown Source)
         [junit4]    > 	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:202)
         [junit4]    > 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
         [junit4]    > 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
         [junit4]    > 	at com.sun.proxy.$Proxy112.getClusterMetrics(Unknown Source)
         [junit4]    > 	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:461)
         [junit4]    > 	at org.apache.hadoop.mapred.ResourceMgrDelegate.getClusterMetrics(ResourceMgrDelegate.java:151)
         [junit4]    > 	at org.apache.hadoop.mapred.YARNRunner.getClusterMetrics(YARNRunner.java:179)
         [junit4]    > 	at org.apache.hadoop.mapreduce.Cluster.getClusterStatus(Cluster.java:246)
         [junit4]    > 	at org.apache.hadoop.mapred.JobClient$3.run(JobClient.java:719)
         [junit4]    > 	at org.apache.hadoop.mapred.JobClient$3.run(JobClient.java:717)
         [junit4]    > 	at java.security.AccessController.doPrivileged(Native Method)
         [junit4]    > 	at javax.security.auth.Subject.doAs(Subject.java:422)
         [junit4]    > 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
         [junit4]    > 	at org.apache.hadoop.mapred.JobClient.getClusterStatus(JobClient.java:717)
         [junit4]    > 	at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:645)
         [junit4]    > 	at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:608)
         [junit4]    > 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
         [junit4]    > 	at org.apache.solr.hadoop.MorphlineGoLiveMiniMRTest.test(MorphlineGoLiveMiniMRTest.java:400)
         [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960)
         [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935)
         [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
         [junit4]    > Caused by: java.io.IOException: Connection reset by peer
         [junit4]    > 	at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
         [junit4]    > 	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
         [junit4]    > 	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
         [junit4]    > 	at sun.nio.ch.IOUtil.write(IOUtil.java:65)
         [junit4]    > 	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
         [junit4]    > 	at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
         [junit4]    > 	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
         [junit4]    > 	at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
         [junit4]    > 	at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
         [junit4]    > 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
         [junit4]    > 	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
         [junit4]    > 	at java.io.DataOutputStream.flush(DataOutputStream.java:123)
         [junit4]    > 	at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1030)
         [junit4]    > 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         [junit4]    > 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         [junit4]    > 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         [junit4]    > 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         [junit4]    > 	... 1 more
      

      Which is

          int mappers = new JobClient(job.getConfiguration()).getClusterStatus().getMaxMapTasks(); // MR1
          //int mappers = job.getCluster().getClusterStatus().getMapSlotCapacity(); // Yarn only
      

      Then later, this segment would fail:

          int reducers = new JobClient(job.getConfiguration()).getClusterStatus().getMaxReduceTasks(); // MR1
          //reducers = job.getCluster().getClusterStatus().getReduceSlotCapacity(); // Yarn only
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mdrob Mike Drob
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: