Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1758

Web UI JSP pages thread safety issue

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.20.204.0
    • Fix Version/s: 0.20.204.0
    • Component/s: tools
    • Labels:
      None
    • Environment:

      branch-20-security

    • Hadoop Flags:
      Reviewed

      Description

      The set of JSP pages that web UI uses are not thread safe. We have observed some problems when requesting Live/Dead/Decommissioning pages from the web UI, incorrect page is displayed. To be more specific, requesting Dead node list page, sometimes, Live node page is returned. Requesting decommissioning page, sometimes, dead page is returned.

      The root cause of this problem is that JSP page is not thread safe by default. When multiple requests come in, each request is assigned to a different thread, multiple threads access the same instance of the servlet class resulted from a JSP page. A class variable is shared by multiple threads. The JSP code in 20 branche, for example, dfsnodelist.jsp has

      <!%
        int rowNum = 0;
        int colNum = 0;
        String sorterField = null;
        String sorterOrder = null;
        String whatNodes = "LIVE";
        ...
      %>
      

      declared as class variables. ( These set of variables are declared within <%! code %> directives which made them class members. ) Multiple threads share the same set of class member variables, one request would step on anther's toe.

      However, due to the JSP code refactor, HADOOP-5857, all of these class member variables are moved to become function local variables. So this bug does not appear in Apache trunk. Hence, we have proposed to take a simple fix for this bug on 20 branch alone, to be more specific, branch-0.20-security.

      The simple fix is to add jsp ThreadSafe="false" directive into the related JSP pages, dfshealth.jsp and dfsnodelist.jsp to make them thread safe, i.e. only on request is processed at each time.

      We did evaluate the thread safety issue for other JSP pages on trunk, we noticed a potential problem is that when we retrieving some statistics from namenode, for example, we make the call to

      NamenodeJspHelper.getInodeLimitText(fsn);
      

      in dfshealth.jsp, which eventuality is

        static String getInodeLimitText(FSNamesystem fsn) {
          long inodes = fsn.dir.totalInodes();
          long blocks = fsn.getBlocksTotal();
          long maxobjects = fsn.getMaxObjects();
          ....
      

      some of the function calls are already guarded by readwritelock, e.g. dir.totalInodes, but others are not. As a result of this, the web ui results are not 100% thread safe. But after evaluating the prons and cons of adding a giant lock into the JSP pages, we decided not to issue FSNamesystem ReadWrite locks into JSPs.

      1. HDFS-1758.patch
        0.7 kB
        Tanping Wang

        Activity

        Hide
        Luke Lu added a comment -

        1. Setting isThreadSafe=false will cause these JSPs to be compiled to servlets that implement SingleThreadModel, which deprecated in servlet-api-2.4+ (we use servlet-api-2.5.), as it doesn't really ensure thread-safety in general. It's probably fine in this particular case (no static/session variables are used), if we are deprecating the 0.20 branches soon.

        2. Since we don't have word-tearing issues in conforming JVMs (esp. on x86), and that the locking totalInodes call is the issued first, it'll trigger a memory barrier, which makes the subsequent reads up to date enough for the stats displaying purpose.

        3. Our new web framework makes this type errors impossible.

        Show
        Luke Lu added a comment - 1. Setting isThreadSafe=false will cause these JSPs to be compiled to servlets that implement SingleThreadModel, which deprecated in servlet-api-2.4+ (we use servlet-api-2.5.), as it doesn't really ensure thread-safety in general. It's probably fine in this particular case (no static/session variables are used), if we are deprecating the 0.20 branches soon. 2. Since we don't have word-tearing issues in conforming JVMs (esp. on x86), and that the locking totalInodes call is the issued first, it'll trigger a memory barrier, which makes the subsequent reads up to date enough for the stats displaying purpose. 3. Our new web framework makes this type errors impossible.
        Hide
        Tanping Wang added a comment -

        Hi, Luke,
        Thanks for the explanations and comments. The third comment of yours,

        3. Our new web framework makes this type errors impossible.

        Are you referring to that after the JSP code refactoring, the problematic class variables have become function local variables, so that this type of errors are not possible any more?

        It looks to me that you have agreed with this fix. Since you are the expert on these issues, can you plus 1 on the Jira then? This would make it pretty convincing.

        Regards,
        Tanping

        Show
        Tanping Wang added a comment - Hi, Luke, Thanks for the explanations and comments. The third comment of yours, 3. Our new web framework makes this type errors impossible. Are you referring to that after the JSP code refactoring, the problematic class variables have become function local variables, so that this type of errors are not possible any more? It looks to me that you have agreed with this fix. Since you are the expert on these issues, can you plus 1 on the Jira then? This would make it pretty convincing. Regards, Tanping
        Hide
        Suresh Srinivas added a comment -

        Luke, this change is being done for 20 release only. On newer release, the JSPs have been cleaned up and this problem does not exist.

        +1 for the patch for release 20.

        Show
        Suresh Srinivas added a comment - Luke, this change is being done for 20 release only. On newer release, the JSPs have been cleaned up and this problem does not exist. +1 for the patch for release 20.
        Hide
        Luke Lu added a comment -

        3. Our new web framework makes this type errors impossible. Are you referring to that after the JSP code refactoring, the problematic class variables have become function local variables, so that this type of errors are not possible any more?

        Actually, it was just a shameless plug for my little web framework (currently being used in mapreduce.next), where you don't have to worry about instance variable thread-safety in controllers and views

        Show
        Luke Lu added a comment - 3. Our new web framework makes this type errors impossible. Are you referring to that after the JSP code refactoring, the problematic class variables have become function local variables, so that this type of errors are not possible any more? Actually, it was just a shameless plug for my little web framework (currently being used in mapreduce.next), where you don't have to worry about instance variable thread-safety in controllers and views
        Hide
        Suresh Srinivas added a comment -

        I committed the patch. Thank you Tanping.

        Show
        Suresh Srinivas added a comment - I committed the patch. Thank you Tanping.
        Hide
        Allen Wittenauer added a comment -

        I'm confused by this commit.

        It looks like it was committed to branch-0.20-security but 0.20.203.1 is listed as the environment. Was this actually supposed to get committed to branch-0.20-security-203 or both or what?

        Show
        Allen Wittenauer added a comment - I'm confused by this commit. It looks like it was committed to branch-0.20-security but 0.20.203.1 is listed as the environment. Was this actually supposed to get committed to branch-0.20-security-203 or both or what?
        Hide
        Tanping Wang added a comment -

        It should only be committed to branch-0.20-security only. HDFS does not have a version that exact matches branch-20-security. The closest was 0.20.203.1.

        Show
        Tanping Wang added a comment - It should only be committed to branch-0.20-security only. HDFS does not have a version that exact matches branch-20-security. The closest was 0.20.203.1.
        Hide
        Tanping Wang added a comment -

        Correct Fix Version/s:
        0.20.204

        Show
        Tanping Wang added a comment - Correct Fix Version/s: 0.20.204
        Hide
        Owen O'Malley added a comment -

        Hadoop 0.20.204.0 was released.

        Show
        Owen O'Malley added a comment - Hadoop 0.20.204.0 was released.

          People

          • Assignee:
            Tanping Wang
            Reporter:
            Tanping Wang
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development