Pig
  1. Pig
  2. PIG-3446 Umbrella jira for Pig on Tez
  3. PIG-3842

Pig on tez job hangs when AM has a failure and Multiquery fixes

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: tez-branch
    • Component/s: tez
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      For eg: when submitting to a wrong queue below exception is encountered and job hangs

      2014-03-27 19:31:17,981 [JobControl] INFO 
      org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application
      application_1394493512142_22453
      2014-03-27 19:31:17,983 [JobControl] ERROR
      org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Cannot submit DAG
      java.lang.RuntimeException: TezSession has already shutdown
              at
      org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.waitForTezSessionReady(TezSessionManager.java:89)
              at
      org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.createSession(TezSessionManager.java:113)
              at
      org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.getSession(TezSessionManager.java:154)
              at
      org.apache.pig.backend.hadoop.executionengine.tez.TezJob.submit(TezJob.java:92)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:601)
              at
      org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
              at
      org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
              at
      org.apache.pig.backend.hadoop.executionengine.tez.TezJobControl.run(TezJobControl.java:43)
              at java.lang.Thread.run(Thread.java:722)
              at
      org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:102)
      
      1. PIG-3842-4.patch
        152 kB
        Rohini Palaniswamy
      2. PIG-3842-3.patch
        143 kB
        Rohini Palaniswamy
      3. PIG-3842-2.patch
        141 kB
        Rohini Palaniswamy
      4. PIG-3842-1.patch
        139 kB
        Rohini Palaniswamy

        Activity

        Hide
        Rohini Palaniswamy added a comment -

        Found that TestTezCompiler.testMulitQueryWithSplitMultiVertex passes with jdk7 (I did generate with jdk7) but fails with jdk6 as the plan printing order differs. Need to fix that to be constant.

        Show
        Rohini Palaniswamy added a comment - Found that TestTezCompiler.testMulitQueryWithSplitMultiVertex passes with jdk7 (I did generate with jdk7) but fails with jdk6 as the plan printing order differs. Need to fix that to be constant.
        Hide
        Rohini Palaniswamy added a comment -

        Patch committed to tez-branch. Thanks Daniel and Cheolsoo for the review.

        Created PIG-3884 for porting the multi store counter changes to JobStats,MRJobStats,PigStatsUtil and MRPigStatsUtil to trunk.

        Show
        Rohini Palaniswamy added a comment - Patch committed to tez-branch. Thanks Daniel and Cheolsoo for the review. Created PIG-3884 for porting the multi store counter changes to JobStats,MRJobStats,PigStatsUtil and MRPigStatsUtil to trunk.
        Hide
        Rohini Palaniswamy added a comment -
        Show
        Rohini Palaniswamy added a comment - Reviewboard link - https://reviews.apache.org/r/20148/
        Hide
        Rohini Palaniswamy added a comment -

        Repurposing this jira to also fix some more issues

        Multiquery issues:

        • Wrong input output counters
        • Nested splits are not merged
        • multiquery off mode launches too many jobs as it is not processed in batch. Also was not able to do explain in multi-query off mode.

        Other:

        • TezDAG is not required anymore as DAG now has getCredentials method.
        • Ensured that delegation token is fetched only once for a NN by having the same DAG Credentials object referenced everywhere.
        Show
        Rohini Palaniswamy added a comment - Repurposing this jira to also fix some more issues Multiquery issues: Wrong input output counters Nested splits are not merged multiquery off mode launches too many jobs as it is not processed in batch. Also was not able to do explain in multi-query off mode. Other: TezDAG is not required anymore as DAG now has getCredentials method. Ensured that delegation token is fetched only once for a NN by having the same DAG Credentials object referenced everywhere.
        Hide
        Rohini Palaniswamy added a comment -

        It hangs at

        "main" prio=5 tid=0x00007f87eb000800 nid=0x1703 in Object.wait() [0x000000010bdd4000]
           java.lang.Thread.State: WAITING (on object monitor)
                at java.lang.Object.wait(Native Method)
                - waiting on <0x0000000147042060> (a org.apache.pig.backend.hadoop.executionengine.tez.TezPlanContainer)
                at java.lang.Object.wait(Object.java:503)
                at org.apache.pig.backend.hadoop.executionengine.tez.TezPlanContainer.getNextPlan(TezPlanContainer.java:121)
                - locked <0x0000000147042060> (a org.apache.pig.backend.hadoop.executionengine.tez.TezPlanContainer)
                at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:75)
                at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:374)
                at org.apache.pig.PigServer.launchPlan(PigServer.java:1395)
        
        Show
        Rohini Palaniswamy added a comment - It hangs at "main" prio=5 tid=0x00007f87eb000800 nid=0x1703 in Object .wait() [0x000000010bdd4000] java.lang. Thread .State: WAITING (on object monitor) at java.lang. Object .wait(Native Method) - waiting on <0x0000000147042060> (a org.apache.pig.backend.hadoop.executionengine.tez.TezPlanContainer) at java.lang. Object .wait( Object .java:503) at org.apache.pig.backend.hadoop.executionengine.tez.TezPlanContainer.getNextPlan(TezPlanContainer.java:121) - locked <0x0000000147042060> (a org.apache.pig.backend.hadoop.executionengine.tez.TezPlanContainer) at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:75) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:374) at org.apache.pig.PigServer.launchPlan(PigServer.java:1395)

          People

          • Assignee:
            Rohini Palaniswamy
            Reporter:
            Rohini Palaniswamy
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development