Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.5.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    1. TEZ-1372.1.patch
      30 kB
      Bikas Saha
    2. TEZ-1372.2.patch
      40 kB
      Bikas Saha
    3. TEZ-1372.svg
      182 kB
      Gopal V
    4. TEZ-1372.3.patch
      40 kB
      Bikas Saha
    5. TEZ-1372.4.patch
      41 kB
      Bikas Saha

      Issue Links

        Activity

        Hide
        Bikas Saha added a comment -

        Bulk close for jiras fixed in 0.5.0.

        Show
        Bikas Saha added a comment - Bulk close for jiras fixed in 0.5.0.
        Hide
        Bikas Saha added a comment -

        Committed. Thanks for the reviews and trying it out.
        commit 80542718645d96cf23865428a166fe6aa3594a61
        Author: Bikas Saha <bikas@apache.org>
        Date: Fri Aug 8 13:18:13 2014 -0700

        TEZ-1372. Fix preWarm to work after recent API changes (bikas)

        Show
        Bikas Saha added a comment - Committed. Thanks for the reviews and trying it out. commit 80542718645d96cf23865428a166fe6aa3594a61 Author: Bikas Saha <bikas@apache.org> Date: Fri Aug 8 13:18:13 2014 -0700 TEZ-1372 . Fix preWarm to work after recent API changes (bikas)
        Hide
        Bikas Saha added a comment -

        Yeah. Sorry. Uploaded new patch.

        Show
        Bikas Saha added a comment - Yeah. Sorry. Uploaded new patch.
        Hide
        Gopal V added a comment -

        Missed PreWarmVertex.java in diff?

        Show
        Gopal V added a comment - Missed PreWarmVertex.java in diff?
        Hide
        Bikas Saha added a comment -

        Attaching commit patch.

        Show
        Bikas Saha added a comment - Attaching commit patch.
        Hide
        Bikas Saha added a comment -

        Yes. That what I intend to do per comment above.
        https://issues.apache.org/jira/browse/TEZ-1372?focusedCommentId=14087045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14087045
        I will call it PreWarmVertex because per users above it helps make the connect the dots.

        Show
        Bikas Saha added a comment - Yes. That what I intend to do per comment above. https://issues.apache.org/jira/browse/TEZ-1372?focusedCommentId=14087045&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14087045 I will call it PreWarmVertex because per users above it helps make the connect the dots.
        Hide
        Siddharth Seth added a comment -

        Bikas Saha, it'll be useful for the preWarm API to accept it's own Context / entity instead of Vertex itself. Even if this is a very simple "PreWarmContext extends Vertex". It'll let the API evolve independently of Vertex if required.
        Also, we should just be able to run the default PreWarmProcessor if users don't specify any.

        One of the classes has an @author annotation which needs to be removed.

        Show
        Siddharth Seth added a comment - Bikas Saha , it'll be useful for the preWarm API to accept it's own Context / entity instead of Vertex itself. Even if this is a very simple "PreWarmContext extends Vertex". It'll let the API evolve independently of Vertex if required. Also, we should just be able to run the default PreWarmProcessor if users don't specify any. One of the classes has an @author annotation which needs to be removed.
        Hide
        Gopal V added a comment -

        +1, this is good.

        Opened HIVE-7656 to track the corresponding hive changes.

        Show
        Gopal V added a comment - +1, this is good. Opened HIVE-7656 to track the corresponding hive changes.
        Hide
        Gopal V added a comment -

        Bikas Saha: getting to this after HIVE-7601 and HIVE-7639 gets a test run.

        Show
        Gopal V added a comment - Bikas Saha : getting to this after HIVE-7601 and HIVE-7639 gets a test run.
        Hide
        Thaddeus Diamond added a comment -

        Yes we are sure of that. Thanks, this works. Look forward to using it on master!

        Show
        Thaddeus Diamond added a comment - Yes we are sure of that. Thanks, this works. Look forward to using it on master!
        Hide
        Bikas Saha added a comment -

        If you want to override the sleep to something else then just override the preWarmTezCode() method of PreWarmContainer. Its been made protected for this reason.
        waitTillReady() is a temporary. The plan is to allow the prewarm dag get killed when the first real dag comes in. Because the intent of pre-warming is to use the fallow time to do something useful and not come in the way of doing something useful. waitTillReady() is a recommended method that should be called before submitting any DAG (irrespective of prewarm) to ensure that the app master is not busy running another dag. In your case, I am guessing you are sure that App Master is not busy running another DAG already.

        Show
        Bikas Saha added a comment - If you want to override the sleep to something else then just override the preWarmTezCode() method of PreWarmContainer. Its been made protected for this reason. waitTillReady() is a temporary. The plan is to allow the prewarm dag get killed when the first real dag comes in. Because the intent of pre-warming is to use the fallow time to do something useful and not come in the way of doing something useful. waitTillReady() is a recommended method that should be called before submitting any DAG (irrespective of prewarm) to ensure that the app master is not busy running another dag. In your case, I am guessing you are sure that App Master is not busy running another DAG already.
        Hide
        Thaddeus Diamond added a comment -

        Okay great. Set that config up and will ignore the sleep impl for now.

        From an API perspective +1. The one thing is perhaps the call to preWarm(vertex); doesn't have to be followed by tezClient.waitTillReady();? No application would want to start before the DAG for prewarming completes so it could be implicit.

        Show
        Thaddeus Diamond added a comment - Okay great. Set that config up and will ignore the sleep impl for now. From an API perspective +1. The one thing is perhaps the call to preWarm(vertex); doesn't have to be followed by tezClient.waitTillReady(); ? No application would want to start before the DAG for prewarming completes so it could be implicit.
        Hide
        Bikas Saha added a comment -

        The intent is to remove the sleep. If its added as an API now then it would be hard to remove.
        You should consider setting your min held containers config == num prewarmed containers for best results.

        Show
        Bikas Saha added a comment - The intent is to remove the sleep. If its added as an API now then it would be hard to remove. You should consider setting your min held containers config == num prewarmed containers for best results.
        Hide
        Thaddeus Diamond added a comment - - edited

        Woops! I was trying to make it more "sample-like" but yes that is actually supposed to match QueryPreWarmProcessor.class.getName() in the line above.

        QueryPreWarmProcessor would derive from PreWarmProcessor, but is there a way to configure the sleep in the payload? May just be nice so we could allow the user to tune this based on their known configuration.

        Also, just migrated over to the PreWarmProcessor. Very nice syntactic sugar.

        Show
        Thaddeus Diamond added a comment - - edited Woops! I was trying to make it more "sample-like" but yes that is actually supposed to match QueryPreWarmProcessor.class.getName() in the line above. QueryPreWarmProcessor would derive from PreWarmProcessor , but is there a way to configure the sleep in the payload? May just be nice so we could allow the user to tune this based on their known configuration. Also, just migrated over to the PreWarmProcessor . Very nice syntactic sugar.
        Hide
        Bikas Saha added a comment -

        Gopal V Hitesh Shah Review please!

        Show
        Bikas Saha added a comment - Gopal V Hitesh Shah Review please!
        Hide
        Bikas Saha added a comment -

        Thanks for trying it out.

        Why are you using MyDummyProcessor.class.getName() for the name? It does not matter to us, unless you are trying to use it as some key look up that may be better done via some API. Is QueryPreWarmProcessor deriving from the built-in PreWarmProcessor? It may be better in some cases where PreWarmProcessor may be able to initialize Tez lib classes. Also, PreWarmProcessor sleeps for some time. This hack is useful when we dont get containers quickly enough from YARN. Without the sleep the tasks might finish very quickly and container reuse may kick in. So we end up not acquiring preWarmNumberOfContainers.

        Show
        Bikas Saha added a comment - Thanks for trying it out. Why are you using MyDummyProcessor.class.getName() for the name? It does not matter to us, unless you are trying to use it as some key look up that may be better done via some API. Is QueryPreWarmProcessor deriving from the built-in PreWarmProcessor? It may be better in some cases where PreWarmProcessor may be able to initialize Tez lib classes. Also, PreWarmProcessor sleeps for some time. This hack is useful when we dont get containers quickly enough from YARN. Without the sleep the tasks might finish very quickly and container reuse may kick in. So we end up not acquiring preWarmNumberOfContainers.
        Hide
        Thaddeus Diamond added a comment -

        Very nice. It's working. Got it to run with:

        TezClient tezClient = new TezClient(sessionName, tezConfiguration);
        tezClient.start();
        
        ProcessorDescriptor preWarmProcDescriptor = new ProcessorDescriptor(QueryPreWarmProcessor.class.getName());
        Resource resource = Resource.newInstance(tezVertexMemoryMb, tezVertexVCores);
        Vertex vertex = new Vertex(MyDummyProcessor.class.getName(), preWarmProcDescriptor, preWarmNumberOfContainers, resource);
        
        tezClient.preWarm(vertex);
        tezClient.waitTillReady();
        

        Does that look correct to you guys procedurally? Do I need the last line? m2c is that I don't need any more sugar than that as an app developer. Makes sense semantically too, I use my application's processor and normal description to prewarm the client. For good measure I could cache the ProcessorDescriptor and Resource in my example and reuse them later in the application but just change the I/O and Vertex object.

        Show
        Thaddeus Diamond added a comment - Very nice. It's working. Got it to run with: TezClient tezClient = new TezClient(sessionName, tezConfiguration); tezClient.start(); ProcessorDescriptor preWarmProcDescriptor = new ProcessorDescriptor(QueryPreWarmProcessor.class.getName()); Resource resource = Resource.newInstance(tezVertexMemoryMb, tezVertexVCores); Vertex vertex = new Vertex(MyDummyProcessor.class.getName(), preWarmProcDescriptor, preWarmNumberOfContainers, resource); tezClient.preWarm(vertex); tezClient.waitTillReady(); Does that look correct to you guys procedurally? Do I need the last line? m2c is that I don't need any more sugar than that as an app developer. Makes sense semantically too, I use my application's processor and normal description to prewarm the client. For good measure I could cache the ProcessorDescriptor and Resource in my example and reuse them later in the application but just change the I/O and Vertex object.
        Hide
        Thaddeus Diamond added a comment -

        Okay, could we up those two tests? Once I commented out the timeouts they seemed to work. Also, we should probably refactor these into static constants so it's easier to tune.

        Show
        Thaddeus Diamond added a comment - Okay, could we up those two tests? Once I commented out the timeouts they seemed to work. Also, we should probably refactor these into static constants so it's easier to tune.
        Hide
        Bikas Saha added a comment -

        TestContainerReuse is known to occasionally fail. The timeouts are mostly determined using a guess at some same value in which the test should finish given what its doing. For trivial tests we use 5s. For longer one around 30s. For tests that involve mini clusters 60s.

        Show
        Bikas Saha added a comment - TestContainerReuse is known to occasionally fail. The timeouts are mostly determined using a guess at some same value in which the test should finish given what its doing. For trivial tests we use 5s. For longer one around 30s. For tests that involve mini clusters 60s.
        Hide
        Thaddeus Diamond added a comment -

        Varies every time but they typically are in the TestContainerReuse class. Most recently it was testLocalityRequest and testHostResolveAttempt.

        Show
        Thaddeus Diamond added a comment - Varies every time but they typically are in the TestContainerReuse class. Most recently it was testLocalityRequest and testHostResolveAttempt .
        Hide
        Bikas Saha added a comment -

        Which tests? TestSecureShuffle is known and is being tracked as a blocker. It has been temporarily disabled.

        Show
        Bikas Saha added a comment - Which tests? TestSecureShuffle is known and is being tracked as a blocker. It has been temporarily disabled.
        Hide
        Thaddeus Diamond added a comment - - edited

        Sorry, grabbed the first one accidentally. Got it to build the JARs with patch.2 and am now testing it. Sidenote: what's up with the tests timing out? I mentioned it on another thread and my developer env specs, but am still seeing tez-dag unable to build because some tests timeout (I used the source JAR from tez-dag to compile my project against, it's generated before the package target). How are the timeouts determined?

        Show
        Thaddeus Diamond added a comment - - edited Sorry, grabbed the first one accidentally. Got it to build the JARs with patch.2 and am now testing it. Sidenote: what's up with the tests timing out? I mentioned it on another thread and my developer env specs, but am still seeing tez-dag unable to build because some tests timeout (I used the source JAR from tez-dag to compile my project against, it's generated before the package target). How are the timeouts determined?
        Hide
        Bikas Saha added a comment - - edited

        That latest patch builds fine for me. Dont see prewarmcontext anywhere. Do you have any local edits?

        Show
        Bikas Saha added a comment - - edited That latest patch builds fine for me. Dont see prewarmcontext anywhere. Do you have any local edits?
        Hide
        Thaddeus Diamond added a comment -

        This fails to build because the OrderedWordCount class in mapreduce-examples uses the PreWarmContext.

        Show
        Thaddeus Diamond added a comment - This fails to build because the OrderedWordCount class in mapreduce-examples uses the PreWarmContext .
        Hide
        Bikas Saha added a comment -

        Patch with unit tests.

        Show
        Bikas Saha added a comment - Patch with unit tests.
        Hide
        Bikas Saha added a comment -

        Gopal V Let me know if creating a new class called PreWarmTemplate that derives from Vertex helps the API sugar. Please review the currently attached patch.

        Show
        Bikas Saha added a comment - Gopal V Let me know if creating a new class called PreWarmTemplate that derives from Vertex helps the API sugar. Please review the currently attached patch.
        Hide
        Bikas Saha added a comment -

        Re-attaching patch because it missed a new file.

        Show
        Bikas Saha added a comment - Re-attaching patch because it missed a new file.
        Hide
        Thaddeus Diamond added a comment -

        This looks right on. I'll try it tomorrow (on EST).

        Show
        Thaddeus Diamond added a comment - This looks right on. I'll try it tomorrow (on EST).
        Hide
        Bikas Saha added a comment -

        Attaching patch. Thaddeus Diamond Perhaps you can try out the patch and let us know if it works. Gopal V Can you see if the patch makes sense. OrderedWordCount is updated to use the new code. New javadocs.

        Show
        Bikas Saha added a comment - Attaching patch. Thaddeus Diamond Perhaps you can try out the patch and let us know if it works. Gopal V Can you see if the patch makes sense. OrderedWordCount is updated to use the new code. New javadocs.
        Hide
        Bikas Saha added a comment -

        It will still be session.preWarm(FOO) where FOO is Vertex instead of PreWarmContext.

        Show
        Bikas Saha added a comment - It will still be session.preWarm(FOO) where FOO is Vertex instead of PreWarmContext.
        Hide
        Thaddeus Diamond added a comment -

        Linking TEZ-1370 because this ticket was borne out of troubles I had there.

        Show
        Thaddeus Diamond added a comment - Linking TEZ-1370 because this ticket was borne out of troubles I had there.
        Hide
        Thaddeus Diamond added a comment -

        I like Gopal V's idea, but am happy to make a call like:

        DAG preWarmDag = new PreWarmDAG(<container_info>);
        tezClient.submitDag(preWarmDag);
        tezClient.waitTillReady();
        
        Show
        Thaddeus Diamond added a comment - I like Gopal V 's idea, but am happy to make a call like: DAG preWarmDag = new PreWarmDAG(<container_info>); tezClient.submitDag(preWarmDag); tezClient.waitTillReady();
        Hide
        Gopal V added a comment -

        1) Remove PreWarmContext and startPreWarmContainers. Use Vertex and submitDAG instead

        I was hoping for an ultra simple session.preWarm() syntactic sugar wrapping this particular codepath calling submitDAG.

        Show
        Gopal V added a comment - 1) Remove PreWarmContext and startPreWarmContainers. Use Vertex and submitDAG instead I was hoping for an ultra simple session.preWarm() syntactic sugar wrapping this particular codepath calling submitDAG.
        Hide
        Bikas Saha added a comment - - edited

        Plan or action
        1) Remove PreWarmContext and startPreWarmContainers. Use Vertex and submitDAG instead. Using Vertex seems to be the only way to keep it consistent with other changes in Vertex without replicating them in PreWarmContext. This way the user knows that they need to do the same thing with this vertex as with any other vertex in their DAG.
        2) Because of 1), DAG creation moves to client. DAG name can still be used to identify pre-warm DAG.
        3) The current code mentions kill prewarm dag but doesn't do it. Fix that. This would be in the AM since it knows if the current DAG is the pre-warm DAG or not and kill it. Alternatively, we could require users to check TezClient.waitTillReady() before submitting the real DAG which they anyways should be doing currently. Prefer the latter because it more consistent and less work. Typically preWarm is used in the fallow time when the application starts up.

        Show
        Bikas Saha added a comment - - edited Plan or action 1) Remove PreWarmContext and startPreWarmContainers. Use Vertex and submitDAG instead. Using Vertex seems to be the only way to keep it consistent with other changes in Vertex without replicating them in PreWarmContext. This way the user knows that they need to do the same thing with this vertex as with any other vertex in their DAG. 2) Because of 1), DAG creation moves to client. DAG name can still be used to identify pre-warm DAG. 3) The current code mentions kill prewarm dag but doesn't do it. Fix that. This would be in the AM since it knows if the current DAG is the pre-warm DAG or not and kill it. Alternatively, we could require users to check TezClient.waitTillReady() before submitting the real DAG which they anyways should be doing currently. Prefer the latter because it more consistent and less work. Typically preWarm is used in the fallow time when the application starts up.

          People

          • Assignee:
            Bikas Saha
            Reporter:
            Bikas Saha
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development