Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1924

Tez AM does not register with AM with full FQDN causing jobs to fail in some environments

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.5.2
    • 0.5.4
    • None
    • None

    Description

      Issue originally reported by Karam Singh.

      All OrderWordCount, WordCount and Tez tests faultTolerance system tests failed due to java.net.UnknownHostException
      Interesting other tez examples such as mrrsleep, randomwriter, randomtextwriter, sort, join_inner, join_outer, terasort, groupbyorderbymrrtest ran fine
      one such example is following

      RUNNING: /usr/lib/hadoop/bin/hadoop jar /usr/lib/tez/tez-mapreduce-examples-0.4.0.2.1.7.0-784.jar orderedwordcount "-DUSE_TEZ_SESSION=true" "-Dmapreduce.map.memory.mb=2048" "-Dtez.am.shuffle-vertex-manager.max-src-fraction=0" "-Dmapreduce.reduce.memory.mb=2048" "-Dmapreduce.framework.name=yarn-tez" "-Dtez.am.container.reuse.enabled=false" "-Dtez.am.log.level=DEBUG" "-Dmapreduce.map.java.opts=-Xmx1024m" "-Dtez.am.shuffle-vertex-manager.min-src-fraction=0" "-Dmapreduce.job.reduce.slowstart.completedmaps=0.01" "-Dmapreduce.reduce.java.opts=-Xmx1024m" "-Dtez.am.container.session.delay-allocation-millis=120000" /user/hrt_qa/Tez_CR_1/TestContainerReuse1 /user/hrt_qa/Tez_CROutput_1 /user/hrt_qa/Tez_CR_2/TestContainerReuse2 /user/hrt_qa/Tez_CROutput_2 -generateSplitsInClient true
      14/12/19 09:20:05 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/
      14/12/19 09:20:05 INFO client.RMProxy: Connecting to ResourceManager at headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
      14/12/19 09:20:05 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200
      14/12/19 09:20:06 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
      14/12/19 09:20:06 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 60 second(s).
      14/12/19 09:20:06 INFO impl.MetricsSystemImpl: azure-file-system metrics system started
      14/12/19 09:20:07 INFO client.TezClientUtils: Permissions on staging directory wasb://humb-tez1@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016 are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx------
      14/12/19 09:20:07 INFO examples.OrderedWordCount: Creating Tez Session
      14/12/19 09:20:07 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/
      14/12/19 09:20:07 INFO client.RMProxy: Connecting to ResourceManager at headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
      14/12/19 09:20:07 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200
      14/12/19 09:20:09 INFO impl.YarnClientImpl: Submitted application application_1418977790315_0016
      14/12/19 09:20:09 INFO examples.OrderedWordCount: Created Tez Session
      14/12/19 09:20:09 INFO examples.OrderedWordCount: Running OrderedWordCount DAG, dagIndex=1, inputPath=/user/hrt_qa/Tez_CR_1/TestContainerReuse1, outputPath=/user/hrt_qa/Tez_CROutput_1
      14/12/19 09:20:09 INFO hadoop.MRHelpers: Generating new input splits, splitsDir=wasb://humb-tez1@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
      14/12/19 09:20:09 INFO input.FileInputFormat: Total input paths to process : 20
      14/12/19 09:20:09 INFO examples.OrderedWordCount: Waiting for TezSession to get into ready state
      14/12/19 09:20:14 INFO client.TezSession: Failed to retrieve AM Status via proxy
      org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "workernode1":59575; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
      	at org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:351)
      	at org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:538)
      	at org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
      	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
      	at org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
      Caused by: com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "workernode1":59575; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
      	at com.sun.proxy.$Proxy24.getAMStatus(Unknown Source)
      	at org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:337)
      	... 14 more
      Caused by: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "workernode1":59575; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
      	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:742)
      	at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:400)
      	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1452)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1381)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
      	... 16 more
      Caused by: java.net.UnknownHostException
      	... 21 more
      
      ....................
      ....................
      
      Caused by: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "workernode1":59575; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
      	at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown Source)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
      	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:742)
      	at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:400)
      	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1452)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1381)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
      	... 16 more
      Caused by: java.net.UnknownHostException
      	... 21 more
      14/12/19 09:25:19 ERROR examples.OrderedWordCount: Error occurred when submitting/running DAGs
      java.lang.RuntimeException: TezSession has already shutdown
      	at org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:540)
      	at org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
      	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
      	at org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
      14/12/19 09:25:19 INFO examples.OrderedWordCount: Shutting down session
      14/12/19 09:25:19 INFO client.TezSession: Shutting down Tez Session, sessionName=OrderedWordCountSession, applicationId=application_1418977790315_0016
      14/12/19 09:25:19 INFO client.TezSession: Failed to shutdown Tez Session via proxy
      org.apache.tez.dag.api.SessionNotRunning: Application not running, applicationId=application_1418977790315_0016, yarnApplicationState=FINISHED, finalApplicationStatus=SUCCEEDED, trackingUrl=http://headnode0.humb-tez1-ssh.d5.internal.cloudapp.net:8088/proxy/application_1418977790315_0016/A
      	at org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:733)
      	at org.apache.tez.client.TezSession.stop(TezSession.java:281)
      	at org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:524)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
      	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
      	at org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
      14/12/19 09:25:19 INFO client.TezSession: Could not connect to AM, killing session via YARN, sessionName=OrderedWordCountSession, applicationId=application_1418977790315_0016
      14/12/19 09:25:19 INFO impl.YarnClientImpl: Killed application application_1418977790315_0016
      java.lang.RuntimeException: TezSession has already shutdown
      	at org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:540)
      	at org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
      	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
      	at org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
      
      

      Contents of /etc/hosts are:

      127.0.0.1 localhost
      
      # The following lines are desirable for IPv6 capable hosts
      ::1 ip6-localhost ip6-loopback
      fe00::0 ip6-localnet
      ff00::0 ip6-mcastprefix
      ff02::1 ip6-allnodes
      ff02::2 ip6-allrouters
      ff02::3 ip6-allhosts
      

      and contents of resolv.conf are:

      # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
      #     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
      nameserver 168.63.129.16
      search humb-tez1-ssh.d5.internal.cloudapp.net
      

      Attachments

        1. TEZ-1924.2.patch
          0.9 kB
          Ivan Mitic
        2. TEZ-20.patch
          3 kB
          Ivan Mitic

        Activity

          People

            ivanmi Ivan Mitic
            ivanmi Ivan Mitic
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: