Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.23.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change
    • Tags:
      webapp, mrv2, mvc, web, ui

      Description

      People have expressed interests of bringing Hadoop web UI up to date with the lightweight embedded web mvc framework used in MAPREDUCE-279 (cf. MAPREDUCE-2399). This is the umbrella jira for UI improvement for 0.23+. Individual items like web framework refactor/move and the unique challenge for MR2 webapp security will be filed separately.

      Thoughts/ideas on various improvement of Hadoop web UI are welcome here. Some of the ideas will naturally become sub-issues.

        Activity

        Hide
        Robert Joseph Evans added a comment -

        I have a few concerns about the current framework that I would like to have addressed before it rolls out to the rest of Hadoop. Most of these are minor nits I have from my current work with the framework to split the history server UI away from the application master UI.

        1. Actual HTML4 strict compliance (This is mostly because HTML strict compliance is touted a lot in MAPREDUCE-2399. I am not an expert on HTML, but I think that

          <!-================ Document Structure ==================================->
          <!ENTITY % html.content "HEAD, BODY">

          <!ELEMENT HTML O O (%html.content – document root element -->
          <!ATTLIST HTML
          %i18n; – lang, dir –
          >

          from http://www.w3.org/TR/html401/sgml/dtd.html says that the only thing under HTML should be HEAD or BODY, but none of the pages I have seen generated include a head or a body tag.
        2. Java Script (JQuery) code not being part of the model. Right now if I have a subview that uses some javascript initialization then that needs to be thrown into the model that is generated by the parent view. In NavBlock.java and AppView.java lines 37 and 38 the two classes are highly coupled now and it is very difficult if not impossible to replace NavBlock.java with something else. This is also true for the various data tables that are used in the code.
        3. Common JQuery widgets. We use data table all over the place. Much of the code for this is the same or very similar everywhere. It would be very nice to have a common DataTable SubView that a developer can set the column types, and which columns should support sorting, etc. Then just populate it with the data, instead of manually making almost identical changes many different places to support a different JQuery feature not used before.
        4. Odd Injection from the stack. JobBlock.java line 98 renders an InfoBlock.class. It has a ResponseInfo class injected into it. It is very very confusing to know that lines 90 through 96 are some how creating the ResponseInfo instance that will be injected into the InfoBlock. In my opinion it is much cleaner to not to rely on Guice to inject the dependencies in this case.

        Like I said these are a few minor annoyances but they add up so that making changes to a page more difficult. For the Job History server I had to copy several different classes that had almost no changes at all in them simply to replace the NavBlock with something more appropriate for the History Server.

        Show
        Robert Joseph Evans added a comment - I have a few concerns about the current framework that I would like to have addressed before it rolls out to the rest of Hadoop. Most of these are minor nits I have from my current work with the framework to split the history server UI away from the application master UI. Actual HTML4 strict compliance (This is mostly because HTML strict compliance is touted a lot in MAPREDUCE-2399 . I am not an expert on HTML, but I think that <!- ================ Document Structure ================================== -> <!ENTITY % html.content "HEAD, BODY"> <!ELEMENT HTML O O (%html.content – document root element --> <!ATTLIST HTML %i18n; – lang, dir – > from http://www.w3.org/TR/html401/sgml/dtd.html says that the only thing under HTML should be HEAD or BODY, but none of the pages I have seen generated include a head or a body tag. Java Script (JQuery) code not being part of the model. Right now if I have a subview that uses some javascript initialization then that needs to be thrown into the model that is generated by the parent view. In NavBlock.java and AppView.java lines 37 and 38 the two classes are highly coupled now and it is very difficult if not impossible to replace NavBlock.java with something else. This is also true for the various data tables that are used in the code. Common JQuery widgets. We use data table all over the place. Much of the code for this is the same or very similar everywhere. It would be very nice to have a common DataTable SubView that a developer can set the column types, and which columns should support sorting, etc. Then just populate it with the data, instead of manually making almost identical changes many different places to support a different JQuery feature not used before. Odd Injection from the stack. JobBlock.java line 98 renders an InfoBlock.class. It has a ResponseInfo class injected into it. It is very very confusing to know that lines 90 through 96 are some how creating the ResponseInfo instance that will be injected into the InfoBlock. In my opinion it is much cleaner to not to rely on Guice to inject the dependencies in this case. Like I said these are a few minor annoyances but they add up so that making changes to a page more difficult. For the Job History server I had to copy several different classes that had almost no changes at all in them simply to replace the NavBlock with something more appropriate for the History Server.
        Hide
        Luke Lu added a comment -

        1. HTML4 strict compliance.

        The two O in the DTD, means that they're optional (for both start and end tags Omitting HEAD/BODY in the existing code is intentional as this makes composing valid html much easier. I can give you detailed examples if this is not clear enough. Please report any invalid html the framework produces. I'm already aware of a few cases (missing alt attribute in img etc.) but this is not one of them.

        2. JavaScript (JQuery) code not being part of the model.

        I guess the "model" you're talking about is the composeable blocks, as "model" in a mvc framework has very different meanings, which is not relevant for this particular discussion. One goal of the framework is progressive enhancement. The separation of js and doc structure is intentional. Javascript is mostly kept in init sections of views. OTOH, I can totally understand where you're coming from. The current simplistic but flexible implementation makes certain view/page methods (init methods) coupled with block implementations on the html ids of the blocks. I personally don't find it particular difficult to override various init methods (commonPreHead etc.) to customize different views. Keep in mind, we need to have the choice to have different initialization (active tab etc.) of the same block for different views based on the query. The current way is less than ideal and I have some ideas to make it better. I welcome suggestions to make it even easier.

        3. Common widgets.

        You're totally right. I actually have a table builder in the plan to abstract much of the cruft (mostly for adaptive server-side rendering for huge tables) away from the blocks. OTOH, the current pattern is actually quite straightforward: write the doc structure in the blocks and write the corresponding js init code in views.

        4. InfoBlock in JobBlock.

        Yes, info block is a bit of a hack. The initial use case of the info block is a common about/attribute block for all the webapps. I usually put the construction of ResponseInfo in controller, so that I don't have change views at all. The JobBlock is a special case to reuse the logic. I admit that it's not very intuitive.

        Thanks for all the points raised here, Bobby. The main goal of the framework itself is making such improvement/refactor transparent, type safe and (relatively) straightforward (as it's pure java without all the arcane (and still unsafe) DSLs in existing frameworks). IMO, the fact you refactored jobhistory server quickly without asking any questions is a testament of the ease of use of the framework Again, I welcome suggestions to make it better.

        Show
        Luke Lu added a comment - 1. HTML4 strict compliance. The two O in the DTD, means that they're optional (for both start and end tags Omitting HEAD/BODY in the existing code is intentional as this makes composing valid html much easier. I can give you detailed examples if this is not clear enough. Please report any invalid html the framework produces. I'm already aware of a few cases (missing alt attribute in img etc.) but this is not one of them. 2. JavaScript (JQuery) code not being part of the model. I guess the "model" you're talking about is the composeable blocks, as "model" in a mvc framework has very different meanings, which is not relevant for this particular discussion. One goal of the framework is progressive enhancement. The separation of js and doc structure is intentional. Javascript is mostly kept in init sections of views. OTOH, I can totally understand where you're coming from. The current simplistic but flexible implementation makes certain view/page methods (init methods) coupled with block implementations on the html ids of the blocks. I personally don't find it particular difficult to override various init methods (commonPreHead etc.) to customize different views. Keep in mind, we need to have the choice to have different initialization (active tab etc.) of the same block for different views based on the query. The current way is less than ideal and I have some ideas to make it better. I welcome suggestions to make it even easier. 3. Common widgets. You're totally right. I actually have a table builder in the plan to abstract much of the cruft (mostly for adaptive server-side rendering for huge tables) away from the blocks. OTOH, the current pattern is actually quite straightforward: write the doc structure in the blocks and write the corresponding js init code in views. 4. InfoBlock in JobBlock. Yes, info block is a bit of a hack. The initial use case of the info block is a common about/attribute block for all the webapps. I usually put the construction of ResponseInfo in controller, so that I don't have change views at all. The JobBlock is a special case to reuse the logic. I admit that it's not very intuitive. Thanks for all the points raised here, Bobby. The main goal of the framework itself is making such improvement/refactor transparent, type safe and (relatively) straightforward (as it's pure java without all the arcane (and still unsafe) DSLs in existing frameworks). IMO, the fact you refactored jobhistory server quickly without asking any questions is a testament of the ease of use of the framework Again, I welcome suggestions to make it better.
        Hide
        Aaron T. Myers added a comment -

        Hey Luke, I realize this feedback is really late to the game, but I remain unconvinced that introducing a novel web framework into the Hadoop codebase is necessary.

        Most of the reasons you provided previously in MAPREDUCE-2399 seem to be against MVC frameworks as a whole because Hadoop really only needs a templating engine with a little bit of controller help. I agree Hadoop probably shouldn't incorporate a full-blown MVC framework, but there exist several template-only frameworks out there which might work. Can you perhaps enumerate which existing libraries you looked at, and why you ruled them out? In particular, did you look at Jamon? (http://www.jamon.org/)

        Furthermore, I don't find the "every Java developer can pickup [Hamlet] without having to learn any new (expression) language syntax" argument terribly compelling. Developers who want to make changes to a Hadoop web page will now need to learn Hamlet, which has its own set of things to learn, and presently has a dearth of documentation, developers with experience using it, or a community supporting it. Using Hamlet for all Hadoop web UIs also serves to hinder many developers who do have some familiarity with HTML and already-existing templating engines. From looking at the code which uses Hamlet in MR-279, it honestly seems to me to be strictly more confusing than using a traditional templating system.

        To be completely clear here, I'm not vetoing this. I'm mostly trying to play the devil's advocate. The Hadoop project has re-invented many wheels, and I'm skeptical we need to re-invent the web templating wheel as well.

        Show
        Aaron T. Myers added a comment - Hey Luke, I realize this feedback is really late to the game, but I remain unconvinced that introducing a novel web framework into the Hadoop codebase is necessary. Most of the reasons you provided previously in MAPREDUCE-2399 seem to be against MVC frameworks as a whole because Hadoop really only needs a templating engine with a little bit of controller help. I agree Hadoop probably shouldn't incorporate a full-blown MVC framework, but there exist several template-only frameworks out there which might work. Can you perhaps enumerate which existing libraries you looked at, and why you ruled them out? In particular, did you look at Jamon? ( http://www.jamon.org/ ) Furthermore, I don't find the "every Java developer can pickup [Hamlet] without having to learn any new (expression) language syntax" argument terribly compelling. Developers who want to make changes to a Hadoop web page will now need to learn Hamlet, which has its own set of things to learn, and presently has a dearth of documentation, developers with experience using it, or a community supporting it. Using Hamlet for all Hadoop web UIs also serves to hinder many developers who do have some familiarity with HTML and already-existing templating engines. From looking at the code which uses Hamlet in MR-279, it honestly seems to me to be strictly more confusing than using a traditional templating system. To be completely clear here, I'm not vetoing this. I'm mostly trying to play the devil's advocate. The Hadoop project has re-invented many wheels, and I'm skeptical we need to re-invent the web templating wheel as well.
        Hide
        Luke Lu added a comment -

        Thanks for the feedback Aaron. Not sure if you've actually used the new framework in an IDE. Here is some unique advantages of the framework:

        1. It's secure (always escape content (including attributes) unless you resort to use raw response writer, which is usually not necessary, I'll make it even harder to use raw writer, as I haven't found a legit use of raw writer in user code so far), making XSS practically (besides explicit script element handling) impossible;
        2. IDE/refactor friendly (pure java, live html help/hints without plugins);
        3. Statically validates html (you can still generate bad html if you really try but you can generate valid html much easier).

        None of the existing template engines (including jamon) has these properties. I have used many web frameworks and template engines and I honestly prefer Hamlet for views, as I can pretty much (with IDE hints) guarantee a correct layout when I finish typing, without the tedious modify, reload, inspect/validate cycles.

        Vinod and Bobby did substantial amount of UI work with the new framework without having to ask me any questions related to the framework, which, IMO, is some significant evidence that the framework is fairly easy to use/explore. Excluding the generated code, the source of the entire framework (controller, router and basic view support) is only a couple of KLOC. The framework in MR-279 hasn't been changed since MR-279 branch is created (which IMO, is a testament of its stability). It will be significantly improved when it's moved to hadoop-common based on the experience of people besides myself.

        Here is a complete 3 in 1 hello world example for the framework.

        import org.apache.hadoop.webapp.Controller;
        import static org.apache.hadoop.webapp.WebApps.newWebApp;
        import org.apache.hadoop.webapp.view.HtmlPage;
        
        /**
         * The obligatory example. No xml/jsp/templates/config files! No
         * proliferation of strange annotations either :)
         *
         * <p>3 in 1 example. Check results at
         * <br>http://localhost:8888/hello and
         * <br>http://localhost:8888/hello/html
         * <br>http://localhost:8888/hello/json
         */
        public class HelloWorld {
          public static class Hello extends Controller {
            @Override public void index() { renderText("Hello world!"); }
            public void html() { setTitle("Hello world!"); }
            public void json() { renderJSON("Hello world!"); }
          }
        
          public static class HelloView extends HtmlPage {
            @Override protected void render(Page.Root html) {
              html.
                title($("title")).
                p("#hello-style").
                  _($("title"))._()._();
            }
          }
        
          public static void main(String[] args) {
            newWebApp().at(8888).inDevMode().start().joinThread();
          }
        }
        
        Show
        Luke Lu added a comment - Thanks for the feedback Aaron. Not sure if you've actually used the new framework in an IDE. Here is some unique advantages of the framework: It's secure (always escape content (including attributes) unless you resort to use raw response writer, which is usually not necessary, I'll make it even harder to use raw writer, as I haven't found a legit use of raw writer in user code so far), making XSS practically (besides explicit script element handling) impossible; IDE/refactor friendly (pure java, live html help/hints without plugins); Statically validates html (you can still generate bad html if you really try but you can generate valid html much easier). None of the existing template engines (including jamon) has these properties. I have used many web frameworks and template engines and I honestly prefer Hamlet for views, as I can pretty much (with IDE hints) guarantee a correct layout when I finish typing, without the tedious modify, reload, inspect/validate cycles. Vinod and Bobby did substantial amount of UI work with the new framework without having to ask me any questions related to the framework, which, IMO, is some significant evidence that the framework is fairly easy to use/explore. Excluding the generated code, the source of the entire framework (controller, router and basic view support) is only a couple of KLOC. The framework in MR-279 hasn't been changed since MR-279 branch is created (which IMO, is a testament of its stability). It will be significantly improved when it's moved to hadoop-common based on the experience of people besides myself. Here is a complete 3 in 1 hello world example for the framework. import org.apache.hadoop.webapp.Controller; import static org.apache.hadoop.webapp.WebApps.newWebApp; import org.apache.hadoop.webapp.view.HtmlPage; /** * The obligatory example. No xml/jsp/templates/config files! No * proliferation of strange annotations either :) * * <p>3 in 1 example. Check results at * <br>http: //localhost:8888/hello and * <br>http: //localhost:8888/hello/html * <br>http: //localhost:8888/hello/json */ public class HelloWorld { public static class Hello extends Controller { @Override public void index() { renderText( "Hello world!" ); } public void html() { setTitle( "Hello world!" ); } public void json() { renderJSON( "Hello world!" ); } } public static class HelloView extends HtmlPage { @Override protected void render(Page.Root html) { html. title($( "title" )). p( "#hello-style" ). _($( "title" ))._()._(); } } public static void main( String [] args) { newWebApp().at(8888).inDevMode().start().joinThread(); } }
        Hide
        Robert Joseph Evans added a comment -

        Luke, I do have to say that starting out with Hamlet was confusing for me. I did not ask for help because I usually will read through all the documentation I can for the libraries I am using and even resort to reading through the source code for those libraries before I ask for help. It is probably a bad habit of mine, but I find that I gain a much more in depth knowledge of the library and its potential quirks then just having someone tell me to copy and paste those classes over there and change these fields.

        In relation to your comment about progressive enhancement I do not see how forcing all java script on the page to go through the top level view improves progressive enhancement. I understand that you want to separate the structure from the code, I totally agree with that. The point of separating the structure from the code is so that if java script is not available then the page will still render. It may not offer all of the functionality that would otherwise be available but it renders. Progressive enhancement then goes off and in the java script adds in more functionality piece by piece based off of what the browser supports.

        Yes the top level view does need to be able to help with the initialization, but I think a lot of these problems are coming from trying to delay instantiation of a SubView until it is rendered. I would rather see the sub views created prerender then initialized and finally rendered.

        Another nit that comes to my mind now also is with CSS and that TwoColumnLayout uses table to place elements in different locations on the page. I thought that using table to do layout was an HTML anti-pattern. I don't know what browsers we are targeting so I cannot really say if it is necessary or not for proper layout. However it is such basic CSS that getting it to be supported on all browsers that JQuery supports should not be that difficult.

        Show
        Robert Joseph Evans added a comment - Luke, I do have to say that starting out with Hamlet was confusing for me. I did not ask for help because I usually will read through all the documentation I can for the libraries I am using and even resort to reading through the source code for those libraries before I ask for help. It is probably a bad habit of mine, but I find that I gain a much more in depth knowledge of the library and its potential quirks then just having someone tell me to copy and paste those classes over there and change these fields. In relation to your comment about progressive enhancement I do not see how forcing all java script on the page to go through the top level view improves progressive enhancement. I understand that you want to separate the structure from the code, I totally agree with that. The point of separating the structure from the code is so that if java script is not available then the page will still render. It may not offer all of the functionality that would otherwise be available but it renders. Progressive enhancement then goes off and in the java script adds in more functionality piece by piece based off of what the browser supports. Yes the top level view does need to be able to help with the initialization, but I think a lot of these problems are coming from trying to delay instantiation of a SubView until it is rendered. I would rather see the sub views created prerender then initialized and finally rendered. Another nit that comes to my mind now also is with CSS and that TwoColumnLayout uses table to place elements in different locations on the page. I thought that using table to do layout was an HTML anti-pattern. I don't know what browsers we are targeting so I cannot really say if it is necessary or not for proper layout. However it is such basic CSS that getting it to be supported on all browsers that JQuery supports should not be that difficult.
        Hide
        Luke Lu added a comment -

        I do have to say that starting out with Hamlet was confusing for me.

        I guess the confusion is mostly from the composition of the existing webapps rather than the framework? Since you don't have to learn any new syntax (like wtf is <%& etc. in a template system), the context sensitive help/hints will make you productive without consulting any separate documentation. I bet that once you get the system, you can come back couple months later (after working on non-UI projects) and be immediately productive without reading any documentation. Same thing can't be said for any template systems which has their own particular syntax.

        I do not see how forcing all java script on the page to go through the top level view improves progressive enhancement.

        I didn't say that. Progressive enhancement is a goal and the existing implementation is an easy but obviously not the best way to do it. I can see the desire to improve the composability of blocks/views, without having to do any subclassing and overriding, which is relatively heavy in boilerplate code in Java.

        I don't oppose ideas to include some init logic per block defintion, which gets called when being rendered. How about something like this:

        render(TwoColumnLayout.builder().
               withNav(MyNavBlock.builder().active(1)).
               withContent(MyContentBlock.class));
        

        Which basically enhance the render API to also accept a ViewBuilder interface to do delayed rendering.

        I thought that using table to do layout was an HTML anti-pattern.

        I knew the question would eventually come up I preserved TwoColumnCssLayout in the source. Read it with the comments and compare it with TwoColumnLayout for specific reasons why CSS is bad for this particular but simple layout I want. I think the current generations of CSS is bad for UI (vs content) layout in general (most css layout is fixed grid based and cannot handle dynamic containees correctly) and tables is just a last resort.

        If anyone can solve the problem cleanly, it's easy to rename the layout class in the library without changing webapps code (the same can't be said for most existing framework like rails and play etc which generates layout per webapp).

        Show
        Luke Lu added a comment - I do have to say that starting out with Hamlet was confusing for me. I guess the confusion is mostly from the composition of the existing webapps rather than the framework? Since you don't have to learn any new syntax (like wtf is <%& etc. in a template system), the context sensitive help/hints will make you productive without consulting any separate documentation. I bet that once you get the system, you can come back couple months later (after working on non-UI projects) and be immediately productive without reading any documentation. Same thing can't be said for any template systems which has their own particular syntax. I do not see how forcing all java script on the page to go through the top level view improves progressive enhancement. I didn't say that. Progressive enhancement is a goal and the existing implementation is an easy but obviously not the best way to do it. I can see the desire to improve the composability of blocks/views, without having to do any subclassing and overriding, which is relatively heavy in boilerplate code in Java. I don't oppose ideas to include some init logic per block defintion, which gets called when being rendered. How about something like this: render(TwoColumnLayout.builder(). withNav(MyNavBlock.builder().active(1)). withContent(MyContentBlock.class)); Which basically enhance the render API to also accept a ViewBuilder interface to do delayed rendering. I thought that using table to do layout was an HTML anti-pattern. I knew the question would eventually come up I preserved TwoColumnCssLayout in the source. Read it with the comments and compare it with TwoColumnLayout for specific reasons why CSS is bad for this particular but simple layout I want. I think the current generations of CSS is bad for UI (vs content) layout in general (most css layout is fixed grid based and cannot handle dynamic containees correctly) and tables is just a last resort. If anyone can solve the problem cleanly, it's easy to rename the layout class in the library without changing webapps code (the same can't be said for most existing framework like rails and play etc which generates layout per webapp).
        Hide
        Aaron T. Myers added a comment -

        It's secure (always escape content (including attributes) unless you resort to use raw response writer

        Certainly many (if not most) templating frameworks (Jamon, Mako, etc) do this. Nothing novel there.

        IDE/refactor friendly (pure java, live html help/hints without plugins);

        While that's certainly true about Hamlet, I question whether or not that's a goal which warrants writing/maintaining a brand-new templating framework.

        Statically validates html (you can still generate bad html if you really try but you can generate valid html much easier).

        Certainly there exist templating frameworks which can do this, or at least we could wire in one of the many HTML validators out there to do this when running the tests. This is what I've done in previous web projects with great success. Again, this feature doesn't seem to warrant writing/maintaining a brand new templating system.

        Since you don't have to learn any new syntax (like wtf is <%& etc. in a template system)

        You don't have to learn a new syntax per se, but you do have to learn a new (and IMO arcane) Java library. Like for example, what's with the "$style" method? or the many calls to "_()" all over the code? These were questions I had when first looking at the code which uses Hamlet, and I'm guessing Bobby had the same sort of questions. By using Hamlet, we're cutting off those developers who do have some experience with existing templating systems in favor of a new system which literally no one has experience with. As someone who's used several templating systems in other projects before, I can say with confidence that being required to use Hamlet will make me less productive than a traditional templating system would.

        I don't want to keep harping on this, as I don't feel super strongly about it, but it still seems like a mistake to me to be introducing a novel web templating framework into Hadoop. The Hadoop project, IMO, shouldn't be in the business of web templating. Rather, the Hadoop daemons just have a few web pages they would like to display. Why should we re-invent this wheel which has already been invented many times before? If I'm the only one who feels this way, then I'll shut up about it.

        I know HBase uses Jamon for rendering its web pages. Perhaps someone with experience developing those can comment?

        Show
        Aaron T. Myers added a comment - It's secure (always escape content (including attributes) unless you resort to use raw response writer Certainly many (if not most) templating frameworks (Jamon, Mako, etc) do this. Nothing novel there. IDE/refactor friendly (pure java, live html help/hints without plugins); While that's certainly true about Hamlet, I question whether or not that's a goal which warrants writing/maintaining a brand-new templating framework. Statically validates html (you can still generate bad html if you really try but you can generate valid html much easier). Certainly there exist templating frameworks which can do this, or at least we could wire in one of the many HTML validators out there to do this when running the tests. This is what I've done in previous web projects with great success. Again, this feature doesn't seem to warrant writing/maintaining a brand new templating system. Since you don't have to learn any new syntax (like wtf is <%& etc. in a template system) You don't have to learn a new syntax per se, but you do have to learn a new (and IMO arcane) Java library. Like for example, what's with the "$style" method? or the many calls to "_()" all over the code? These were questions I had when first looking at the code which uses Hamlet, and I'm guessing Bobby had the same sort of questions. By using Hamlet, we're cutting off those developers who do have some experience with existing templating systems in favor of a new system which literally no one has experience with. As someone who's used several templating systems in other projects before, I can say with confidence that being required to use Hamlet will make me less productive than a traditional templating system would. I don't want to keep harping on this, as I don't feel super strongly about it, but it still seems like a mistake to me to be introducing a novel web templating framework into Hadoop. The Hadoop project, IMO, shouldn't be in the business of web templating. Rather, the Hadoop daemons just have a few web pages they would like to display. Why should we re-invent this wheel which has already been invented many times before? If I'm the only one who feels this way, then I'll shut up about it. I know HBase uses Jamon for rendering its web pages. Perhaps someone with experience developing those can comment?
        Hide
        Todd Lipcon added a comment -

        I was responsible for switching HBase from JSP to Jamon a few months back. My reasons for picking Jamon are outlined on that JIRA: HBASE-3835

        I also agree with Aaron that the Java syntax for building a page looks "arcane". This may be because I have a reasonable amount of experience with HTML, having worked on webapps for several years in the past (using first plain JSP, then HTML::Mason, then PHP Smarty, and most recently Jamon).

        Regarding the tedious reload cycle, one advantage of Jamon is that it can be run in a mode where edits to the templates show up "live" for development - so there's not an entire project rebuild for the edit/layout cycle.

        Show
        Todd Lipcon added a comment - I was responsible for switching HBase from JSP to Jamon a few months back. My reasons for picking Jamon are outlined on that JIRA: HBASE-3835 I also agree with Aaron that the Java syntax for building a page looks "arcane". This may be because I have a reasonable amount of experience with HTML, having worked on webapps for several years in the past (using first plain JSP, then HTML::Mason, then PHP Smarty, and most recently Jamon). Regarding the tedious reload cycle, one advantage of Jamon is that it can be run in a mode where edits to the templates show up "live" for development - so there's not an entire project rebuild for the edit/layout cycle.
        Hide
        Eli Collins added a comment -

        Re-implementing templating engine feels out of scope to me as well. My main concern is not about the quality of Hamlet - I assume it's good stuff - but that a Hadoop-only web framework is not going to be actively advanced like other projects and so in a couple of years we'll be on something that's pretty out of date and that only a small number of people know how to change (like our RPC library and serialization framework).

        Luke - have you considered making Hamlet into a separate project and having Hadoop consume this project? I'd be less worried about the above if there was a community built around Hamlet. The advantages of Hamlet you've outlined should make it useful to any project and therefore would attract a community. Similarly, if MR is the only project willing to adopt Hamlet then I wonder if it's a wise choice.

        Show
        Eli Collins added a comment - Re-implementing templating engine feels out of scope to me as well. My main concern is not about the quality of Hamlet - I assume it's good stuff - but that a Hadoop-only web framework is not going to be actively advanced like other projects and so in a couple of years we'll be on something that's pretty out of date and that only a small number of people know how to change (like our RPC library and serialization framework). Luke - have you considered making Hamlet into a separate project and having Hadoop consume this project? I'd be less worried about the above if there was a community built around Hamlet. The advantages of Hamlet you've outlined should make it useful to any project and therefore would attract a community. Similarly, if MR is the only project willing to adopt Hamlet then I wonder if it's a wise choice.
        Hide
        Robert Joseph Evans added a comment -

        I guess the confusion is mostly from the composition of the existing webapps rather than the framework?

        My biggest confusion was with Guice and how it integrates with Hamlet/Jetty. I like inversion of control(IOC) for the flexibility that it can offer, especially with testing. I have used Spring extensively in the past but IOC always comes with a steep learning curve. Even more so when attributes are used to tie the dependencies to where they are injected. This is because there are no clean direct links between implementation and usage in the code. Yes an XML description of the dependencies is often worse, but at least it is mostly in a single place and no real magic is happening behind the scenes where objects just appear. Granted this was not as steep of a curve because there was not as much of the typical Interface/Implementation split that you see in typical IOC programming. I did not have to look at the class hierarchy to try and figure out what is going where as much. But the lack of it confused me a bit because I was expecting it.

        The other big thing to confuse me was the composable blocks as you called them. It took me a while to trace down to the deep layers of JQueryUI, View, etc. to see where each of the parameters were being set and used to know what I had to do to change them. Some javadocs with simple examples would be a big help in understanding how to use them properly.

        Since you don't have to learn any new syntax (like wtf is <%& etc. in a template system)

        $() and _() threw me for a bit of a loop too. Not as bad as <%& because it is not a new language. Typically in java they would be something like getContextValue() and endBlock() respectively. I understand the desire to make the code smaller and more compact but that too is confusing.

        I do think that it is a fairly simple framework to use, once you have overcome the learning curve. Documentation and simple examples could go a very long way in making the system more accessible to new developers. Without it I can see lots of other people wanting to go back to JSPs etc. Simply because it is something they know and can work on without having to learn something new.

        I didn't say that.

        Sorry, I misunderstood you. I like the builder snippet that you put in, I would like it if we could look at moving most of the common HTML generation widgets to something like that.

        If anyone can solve the problem cleanly.

        I trust you to have done your due diligence on this. Like I said previously I am not an expert on HTML and I am not a web developer. I know enough to be dangerous which is why I was asking.

        Show
        Robert Joseph Evans added a comment - I guess the confusion is mostly from the composition of the existing webapps rather than the framework? My biggest confusion was with Guice and how it integrates with Hamlet/Jetty. I like inversion of control(IOC) for the flexibility that it can offer, especially with testing. I have used Spring extensively in the past but IOC always comes with a steep learning curve. Even more so when attributes are used to tie the dependencies to where they are injected. This is because there are no clean direct links between implementation and usage in the code. Yes an XML description of the dependencies is often worse, but at least it is mostly in a single place and no real magic is happening behind the scenes where objects just appear. Granted this was not as steep of a curve because there was not as much of the typical Interface/Implementation split that you see in typical IOC programming. I did not have to look at the class hierarchy to try and figure out what is going where as much. But the lack of it confused me a bit because I was expecting it. The other big thing to confuse me was the composable blocks as you called them. It took me a while to trace down to the deep layers of JQueryUI, View, etc. to see where each of the parameters were being set and used to know what I had to do to change them. Some javadocs with simple examples would be a big help in understanding how to use them properly. Since you don't have to learn any new syntax (like wtf is <%& etc. in a template system) $() and _() threw me for a bit of a loop too. Not as bad as <%& because it is not a new language. Typically in java they would be something like getContextValue() and endBlock() respectively. I understand the desire to make the code smaller and more compact but that too is confusing. I do think that it is a fairly simple framework to use, once you have overcome the learning curve. Documentation and simple examples could go a very long way in making the system more accessible to new developers. Without it I can see lots of other people wanting to go back to JSPs etc. Simply because it is something they know and can work on without having to learn something new. I didn't say that. Sorry, I misunderstood you. I like the builder snippet that you put in, I would like it if we could look at moving most of the common HTML generation widgets to something like that. If anyone can solve the problem cleanly. I trust you to have done your due diligence on this. Like I said previously I am not an expert on HTML and I am not a web developer. I know enough to be dangerous which is why I was asking.
        Hide
        Luke Lu added a comment -

        Certainly many (if not most) templating frameworks (Jamon, Mako, etc) do this. Nothing novel there.

        No, they're not enforced, you can easily turn off escaping with Jamon etc with one character. If the enforcement depends on human, it's not enforced in practice. You cannot even easily write a test-patch to check it because you NEED to turn the escaping off MANUALLY for script and style elements. Hamlet can enforce this automagically.

        Another advantage I forgot to mention is the enforcement of HTML coding guidelines (for style or security reasons), say forbid certain tags (font, center etc.) and attributes (all the on* event attributes.). You cannot do this in any existing template systems. period.

        You don't have to learn a new syntax per se, but you do have to learn a new (and IMO arcane) Java library.

        If you use an IDE, context sensitive help provides the description for the methods right there. I guess I can say Hamlet discriminates against non-IDE users, which is true in general for Java as well

        Why should we re-invent this wheel which has already been invented many times before?

        The old wheel is not round/secure/robust enough? In MRv2, mapreduce becomes user land library, we'll allow user to have their own AM with webapps, which open up a whole new security threat model (MAPREDUCE-2858). Having a light-weight framework to help enforcing security is completely novel.

        Show
        Luke Lu added a comment - Certainly many (if not most) templating frameworks (Jamon, Mako, etc) do this. Nothing novel there. No, they're not enforced, you can easily turn off escaping with Jamon etc with one character. If the enforcement depends on human, it's not enforced in practice. You cannot even easily write a test-patch to check it because you NEED to turn the escaping off MANUALLY for script and style elements. Hamlet can enforce this automagically. Another advantage I forgot to mention is the enforcement of HTML coding guidelines (for style or security reasons), say forbid certain tags (font, center etc.) and attributes (all the on* event attributes.). You cannot do this in any existing template systems. period. You don't have to learn a new syntax per se, but you do have to learn a new (and IMO arcane) Java library. If you use an IDE, context sensitive help provides the description for the methods right there. I guess I can say Hamlet discriminates against non-IDE users, which is true in general for Java as well Why should we re-invent this wheel which has already been invented many times before? The old wheel is not round/secure/robust enough? In MRv2, mapreduce becomes user land library, we'll allow user to have their own AM with webapps, which open up a whole new security threat model ( MAPREDUCE-2858 ). Having a light-weight framework to help enforcing security is completely novel.
        Hide
        Robert Joseph Evans added a comment -

        Rather, the Hadoop daemons just have a few web pages they would like to display.

        I personally would argue that none of the Hadoop daemons should be providing a web interface except possibly a web service preferably restful. I would like to see all of the GUIs moved to a separate server(s) that just acts as a client to the NN/RM etc. and provide an interface on top of that. We currently have several tools that scrape the GUI pages to pull out some critical monitoring or job information information that is not currently available through an API (0.20.204 I have not looked into it deeply on trunk). If we enforced a strict separation then there would never be a need to scrape pages because all of the information would be available through an API. Also the daemons could concentrate on their primary purpose what ever it is and not have to deal with web servers, alternate forms of authentication/authorization, etc.

        Show
        Robert Joseph Evans added a comment - Rather, the Hadoop daemons just have a few web pages they would like to display. I personally would argue that none of the Hadoop daemons should be providing a web interface except possibly a web service preferably restful. I would like to see all of the GUIs moved to a separate server(s) that just acts as a client to the NN/RM etc. and provide an interface on top of that. We currently have several tools that scrape the GUI pages to pull out some critical monitoring or job information information that is not currently available through an API (0.20.204 I have not looked into it deeply on trunk). If we enforced a strict separation then there would never be a need to scrape pages because all of the information would be available through an API. Also the daemons could concentrate on their primary purpose what ever it is and not have to deal with web servers, alternate forms of authentication/authorization, etc.
        Hide
        Luke Lu added a comment -

        Certainly there exist templating frameworks which can do this, or at least we could wire in one of the many HTML validators out there to do this when running the tests.

        I'm curious. Which templating frameworks statically validates html for all possible combination of queries? Having to install html validators and write integration tests (for certain input) sounds a lot of more work than getting it right in a simple java builder with the help of an IDE to me.

        Show
        Luke Lu added a comment - Certainly there exist templating frameworks which can do this, or at least we could wire in one of the many HTML validators out there to do this when running the tests. I'm curious. Which templating frameworks statically validates html for all possible combination of queries? Having to install html validators and write integration tests (for certain input) sounds a lot of more work than getting it right in a simple java builder with the help of an IDE to me.
        Hide
        Aaron T. Myers added a comment -

        No, they're not enforced, you can easily turn off escaping with Jamon etc with one character.

        "They're enforced by default but easy to turn off" is quite different from "they're not enforced." It seems to me the main differentiator Hamlet has in this regard it that it's possible but exceptionally difficult to disable escaping. I don't see this as terribly compelling.

        Another advantage I forgot to mention is the enforcement of HTML coding guidelines (for style or security reasons), say forbid certain tags (font, center etc.) and attributes (all the on* event attributes.). You cannot do this in any existing template systems. period.

        I didn't realize that. That is kind of neat.

        The old wheel is not round/secure/robust enough?

        Which wheels did you check out and find to be not round enough? I previously asked "Can you perhaps enumerate which existing libraries you looked at, and why you ruled them out?"

        Which templating frameworks statically validates html for all possible combination of queries?

        I've used Twisted's templating framework fairly extensively, which does this. I have much less experience with Java templating frameworks, so I don't know of a Java one which has this feature, but I'd be surprised if one didn't already exist.

        All this said, my aim is honestly not to critique Hamlet. Like Eli, I assume it's a good piece of work. I'm just trying to make sure that other pre-existing options were thoroughly examined and found to be so insufficient as to warrant writing a new templating engine. In HBASE-3835, Todd describes some of the reasons why he went with Jamon in HBase. Does Hamlet fulfill all of those requirements?

        Show
        Aaron T. Myers added a comment - No, they're not enforced, you can easily turn off escaping with Jamon etc with one character. "They're enforced by default but easy to turn off" is quite different from "they're not enforced." It seems to me the main differentiator Hamlet has in this regard it that it's possible but exceptionally difficult to disable escaping. I don't see this as terribly compelling. Another advantage I forgot to mention is the enforcement of HTML coding guidelines (for style or security reasons), say forbid certain tags (font, center etc.) and attributes (all the on* event attributes.). You cannot do this in any existing template systems. period. I didn't realize that. That is kind of neat. The old wheel is not round/secure/robust enough? Which wheels did you check out and find to be not round enough? I previously asked "Can you perhaps enumerate which existing libraries you looked at, and why you ruled them out?" Which templating frameworks statically validates html for all possible combination of queries? I've used Twisted's templating framework fairly extensively, which does this. I have much less experience with Java templating frameworks, so I don't know of a Java one which has this feature, but I'd be surprised if one didn't already exist. All this said, my aim is honestly not to critique Hamlet. Like Eli, I assume it's a good piece of work. I'm just trying to make sure that other pre-existing options were thoroughly examined and found to be so insufficient as to warrant writing a new templating engine. In HBASE-3835 , Todd describes some of the reasons why he went with Jamon in HBase. Does Hamlet fulfill all of those requirements?
        Hide
        Luke Lu added a comment -

        I've used Twisted's templating framework fairly extensively, which does this.

        No, it doesn't, it only validate output as XML but not XHTML strict (see namespace notes etc). Sitebrick is closer but still not close enough.

        As I mentioned in MAPREDUCE-279, I've looked at all the popular jvm web framework plus a few others (including Google sitebrick from the Google Wave project). None of the existing system can enforce project specific security/style rules. This is also the reason Hamlet currently is Hadoop specific, as it caters Hadoop, especially MRv2's needs (even though, IMO, these rules apply to many other projects as well). BTW, XHTML is still harmful

        Todd describes some of the reasons why he went with Jamon in HBase. Does Hamlet fulfill all of those requirements?

        Yes, plus more reasons that I mentioned above.

        Show
        Luke Lu added a comment - I've used Twisted's templating framework fairly extensively, which does this. No, it doesn't , it only validate output as XML but not XHTML strict (see namespace notes etc). Sitebrick is closer but still not close enough. As I mentioned in MAPREDUCE-279 , I've looked at all the popular jvm web framework plus a few others (including Google sitebrick from the Google Wave project). None of the existing system can enforce project specific security/style rules. This is also the reason Hamlet currently is Hadoop specific, as it caters Hadoop, especially MRv2's needs (even though, IMO, these rules apply to many other projects as well). BTW, XHTML is still harmful Todd describes some of the reasons why he went with Jamon in HBase. Does Hamlet fulfill all of those requirements? Yes, plus more reasons that I mentioned above.
        Hide
        Milind Bhandarkar added a comment -

        I agree with @atm and others. Hamlet does have some unique capabilities, but it seems to me that it need not be hadoop-specific, and will certainly become a maintenance headache later.

        So, like Eli, I would like to ask if making it a separate project, and building a community around it, is something you have considered ?

        Or other choice is to contribute the per-project style rules, and static validation features to an existing popular open source templating engine.

        But I think the approach described by Robert Evans makes the most sense. Hadoop daemons should expose web-services that exports json/xml, and GUI should be a client. Hadoop management tools at Yahoo and elsewhere have to scrape HTML, which is the problem that needs to be solved. Policies of tags/style enforcement, html validation etc will not remain Hadoop's concern then.

        Show
        Milind Bhandarkar added a comment - I agree with @atm and others. Hamlet does have some unique capabilities, but it seems to me that it need not be hadoop-specific, and will certainly become a maintenance headache later. So, like Eli, I would like to ask if making it a separate project, and building a community around it, is something you have considered ? Or other choice is to contribute the per-project style rules, and static validation features to an existing popular open source templating engine. But I think the approach described by Robert Evans makes the most sense. Hadoop daemons should expose web-services that exports json/xml, and GUI should be a client. Hadoop management tools at Yahoo and elsewhere have to scrape HTML, which is the problem that needs to be solved. Policies of tags/style enforcement, html validation etc will not remain Hadoop's concern then.
        Hide
        Luke Lu added a comment -

        Hamlet does have some unique capabilities, but it need not be hadoop-specific, and will certainly become a maintenance headache later.

        Hamlet is a light-weight pure java framework that's much smaller in code base than JSP itself. IMO, the webapps created with Hamlet is much more maintainable than any existing web framework because of Java/IDE/refactor friendliness.

        But I think the approach described by Robert Evans makes the most sense.

        Except it doesn't really work in the context of MRv2. See my reply to Bobby in MAPREDUCE-2858 for details.

        Show
        Luke Lu added a comment - Hamlet does have some unique capabilities, but it need not be hadoop-specific, and will certainly become a maintenance headache later. Hamlet is a light-weight pure java framework that's much smaller in code base than JSP itself. IMO, the webapps created with Hamlet is much more maintainable than any existing web framework because of Java/IDE/refactor friendliness. But I think the approach described by Robert Evans makes the most sense. Except it doesn't really work in the context of MRv2. See my reply to Bobby in MAPREDUCE-2858 for details.
        Hide
        Arun C Murthy added a comment -

        Lets use MAPREDUCE-2863 to discuss web-services, seems very reasonable suggestion. I believe HDFS already supports it.

        Show
        Arun C Murthy added a comment - Lets use MAPREDUCE-2863 to discuss web-services, seems very reasonable suggestion. I believe HDFS already supports it.
        Hide
        Robert Joseph Evans added a comment -

        Web services is one thing, and the GUI is something else entirely, especially for the Application Master. Web services are good and they often make writing a dynamic web GUI much simpler, but having them does not change my original question. Should be supporting a GUI ON the Application Master at all?

        Having a GUI run on a separate box from the AM (even on a Gateway if the user wants to have new and innovative visualizations) I think keeps our priorities straight and offers a lot of flexibility. I would much rather see a full featured API over a full featured GUI. I don't want to have to write something that is brittle to pull data out of html when I could call a simple API/web service to get it. I also would prefer not to have to go to two sources to get the information we need, but that is a comment that should probably go on MAPREDUCE-2863. I am not anti GUI. I just think that jumping through a bunch of hoops, especially security hoops, to get the GUI to work seems like a lot of effort that I would rather see put into making an API (or web service) that has all of the functionality needed to support Hadoop's users and provide a simple GUI that can be extended later to be more full featured.

        Show
        Robert Joseph Evans added a comment - Web services is one thing, and the GUI is something else entirely, especially for the Application Master. Web services are good and they often make writing a dynamic web GUI much simpler, but having them does not change my original question. Should be supporting a GUI ON the Application Master at all? Having a GUI run on a separate box from the AM (even on a Gateway if the user wants to have new and innovative visualizations) I think keeps our priorities straight and offers a lot of flexibility. I would much rather see a full featured API over a full featured GUI. I don't want to have to write something that is brittle to pull data out of html when I could call a simple API/web service to get it. I also would prefer not to have to go to two sources to get the information we need, but that is a comment that should probably go on MAPREDUCE-2863 . I am not anti GUI. I just think that jumping through a bunch of hoops, especially security hoops, to get the GUI to work seems like a lot of effort that I would rather see put into making an API (or web service) that has all of the functionality needed to support Hadoop's users and provide a simple GUI that can be extended later to be more full featured.
        Hide
        Luke Lu added a comment -

        Should be supporting a GUI ON the Application Master at all?

        Yes. see my response in MAPREDUCE-2858 for reasons.

        Having a GUI run on a separate box from the AM (even on a Gateway if the user wants to have new and innovative visualizations) I think keeps our priorities straight and offers a lot of flexibility.

        This is just trying to punt the webapp/UI management to Hadoop users. As I said MAPREDUCE-2858, this is clearly not scalable and not secure (you can't just link to an arbitrary webapp in RM app list). It's an operations nightmare to let users to run arbitrary web UI on a gateway. With MRv2, Hadoop becomes a fairly general purpose cloud operating system, running per user/version web UI in a scalable and secure manner is a major advantage/feature that we should not shy away from.

        Show
        Luke Lu added a comment - Should be supporting a GUI ON the Application Master at all? Yes. see my response in MAPREDUCE-2858 for reasons. Having a GUI run on a separate box from the AM (even on a Gateway if the user wants to have new and innovative visualizations) I think keeps our priorities straight and offers a lot of flexibility. This is just trying to punt the webapp/UI management to Hadoop users. As I said MAPREDUCE-2858 , this is clearly not scalable and not secure (you can't just link to an arbitrary webapp in RM app list). It's an operations nightmare to let users to run arbitrary web UI on a gateway. With MRv2, Hadoop becomes a fairly general purpose cloud operating system, running per user/version web UI in a scalable and secure manner is a major advantage/feature that we should not shy away from.
        Hide
        Robert Joseph Evans added a comment -

        Yes. see my response in MAPREDUCE-2858 for reasons.

        I read the comment in MAPREDUCE-2858 before writing my previous comment. You want to be able to run different versions of the UI at the same time. That is great, but I argue is not worth the development effort at this time to support it, and there are more standard ways to support bucket testing of different UIs on a trusted server then trying to launch a separate web server for each application that is running. I am not suggesting that we set up a trusted server for each and every version/user. That is not scalable. What I am saying is provide a simple GUI that uses the client APIs to pull the data back.

        you can't just link to an arbitrary webapp in RM app list

        I agree completely I don't feel secure linking to an arbitrary webapp whether it is running on a gateway or is running as part of an application master. If it is controlled completely by an arbitrary user it is essentially impossible to strip out all potentially malicious content in HTML/javascript. That is why the RM app would link to this basic GUI on a trusted server running trusted code.

        It's an operations nightmare to let users to run arbitrary web UI on a gateway

        That is probably true, and is great argument to get a webservice up and running so the enhanced visualizations/analytics could all be done on the users desktop .

        In any case, per user/version web UI is a major step forward, just like per user/version mapreduce runtime in MRv2.

        I disagree I think most users will not want to take the time to update the GUI or try out new/interesting visualizations. What most of the users I have talked to want is access to the data about their processes. They want to do advanced analysis themselves that is accessible outside of the GUI for a single MR job. This is so they can do their own monitoring and alerting. This is so they can analyze the trends in all their processes not just the ones that are currently running and try to understand what they can do to be more efficient.

        MRv2 opens up the door to a set of developers to create new and interesting ways, including 1 off solutions to process data and solve problems. I am just not convinced that any of their problems really involve the Application Master GUI.

        IMO, the current MR history server is the ideal candidate for data and UI separation/refactor.

        ditto.

        Show
        Robert Joseph Evans added a comment - Yes. see my response in MAPREDUCE-2858 for reasons. I read the comment in MAPREDUCE-2858 before writing my previous comment. You want to be able to run different versions of the UI at the same time. That is great, but I argue is not worth the development effort at this time to support it, and there are more standard ways to support bucket testing of different UIs on a trusted server then trying to launch a separate web server for each application that is running. I am not suggesting that we set up a trusted server for each and every version/user. That is not scalable. What I am saying is provide a simple GUI that uses the client APIs to pull the data back. you can't just link to an arbitrary webapp in RM app list I agree completely I don't feel secure linking to an arbitrary webapp whether it is running on a gateway or is running as part of an application master. If it is controlled completely by an arbitrary user it is essentially impossible to strip out all potentially malicious content in HTML/javascript. That is why the RM app would link to this basic GUI on a trusted server running trusted code. It's an operations nightmare to let users to run arbitrary web UI on a gateway That is probably true, and is great argument to get a webservice up and running so the enhanced visualizations/analytics could all be done on the users desktop . In any case, per user/version web UI is a major step forward, just like per user/version mapreduce runtime in MRv2. I disagree I think most users will not want to take the time to update the GUI or try out new/interesting visualizations. What most of the users I have talked to want is access to the data about their processes. They want to do advanced analysis themselves that is accessible outside of the GUI for a single MR job. This is so they can do their own monitoring and alerting. This is so they can analyze the trends in all their processes not just the ones that are currently running and try to understand what they can do to be more efficient. MRv2 opens up the door to a set of developers to create new and interesting ways, including 1 off solutions to process data and solve problems. I am just not convinced that any of their problems really involve the Application Master GUI. IMO, the current MR history server is the ideal candidate for data and UI separation/refactor. ditto.
        Hide
        Luke Lu added a comment -

        First some historical perspective: I have actually proposed the WS + trusted webapp as an option, when I originally architected AM webapps 8 months ago. The people following the discussion at the time picked the current option. In retrospect, I think the current option is more in line with the spirit of MRv2.

        That is great, but I argue is not worth the development effort at this time to support it.

        The development effort to support generic AM webapp security is optional and incremental and has much lower cost than re-architect the whole thing, period.

        There are more standard ways to support bucket testing of different UIs on a trusted server then trying to launch a separate web server for each application that is running.

        We're not talking about bucket testing here. We're talking about a major benefit of MRv2: allowing different organizations to move to different versions of different types of apps on their own schedule. We're not talking about just mapreduce, which is just one type of apps. The WS+trusted UI approach requires O(T*V) (T = number of types of apps, V = number of versions of apps) of trusted servers, which clearly not scalable. MAPREDUCE-2858 only requires O(1) number of trusted servers.

        It's an operations nightmare to let users to run arbitrary web UI on a gateway

        That is probably true, and is great argument to get a webservice up and running so the enhanced visualizations/analytics could all be done on the users desktop .

        That's interesting conclusion. The history is showing that we're moving to webapps, because desktop apps is a nightmare to upgrade/manage/secure

        What most of the users I have talked to want is access to the data about their processes. They want to do advanced analysis themselves that is accessible outside of the GUI for a single MR job.

        That's because they didn't have any control to the GUI at all in MRv1. MRv2 empowers users to do the who damn thing in a cloud/grid/cluster efficiently/reliably/securely

        Show
        Luke Lu added a comment - First some historical perspective: I have actually proposed the WS + trusted webapp as an option, when I originally architected AM webapps 8 months ago. The people following the discussion at the time picked the current option. In retrospect, I think the current option is more in line with the spirit of MRv2. That is great, but I argue is not worth the development effort at this time to support it. The development effort to support generic AM webapp security is optional and incremental and has much lower cost than re-architect the whole thing, period. There are more standard ways to support bucket testing of different UIs on a trusted server then trying to launch a separate web server for each application that is running. We're not talking about bucket testing here. We're talking about a major benefit of MRv2: allowing different organizations to move to different versions of different types of apps on their own schedule. We're not talking about just mapreduce, which is just one type of apps. The WS+trusted UI approach requires O(T*V) (T = number of types of apps, V = number of versions of apps) of trusted servers, which clearly not scalable. MAPREDUCE-2858 only requires O(1) number of trusted servers. It's an operations nightmare to let users to run arbitrary web UI on a gateway That is probably true, and is great argument to get a webservice up and running so the enhanced visualizations/analytics could all be done on the users desktop . That's interesting conclusion. The history is showing that we're moving to webapps, because desktop apps is a nightmare to upgrade/manage/secure What most of the users I have talked to want is access to the data about their processes. They want to do advanced analysis themselves that is accessible outside of the GUI for a single MR job. That's because they didn't have any control to the GUI at all in MRv1. MRv2 empowers users to do the who damn thing in a cloud/grid/cluster efficiently/reliably/securely

          People

          • Assignee:
            Luke Lu
            Reporter:
            Luke Lu
          • Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

            • Created:
              Updated:

              Development