Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4203

Create equivalent of ProcfsBasedProcessTree for Windows

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1-win
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Added an implementation of the process tree for Windows.

      Description

      ProcfsBasedProcessTree is used by the TaskTracker to get process information like memory and cpu usage. This information is used to manage resources etc. The current implementation is based on Linux procfs functionality and hence does not work on other platforms, specifically windows.

      1. test.cpp
        6 kB
        Bikas Saha
      2. MAPREDUCE-4203.patch
        22 kB
        Bikas Saha
      3. MAPREDUCE-4203.branch-1-win.4.patch
        35 kB
        Bikas Saha
      4. MAPREDUCE-4203.branch-1-win.3.patch
        35 kB
        Bikas Saha
      5. MAPREDUCE-4203.branch-1-win.2.patch
        32 kB
        Bikas Saha
      6. MAPREDUCE-4203.branch-1-win.1.patch
        31 kB
        Bikas Saha

        Issue Links

          Activity

          Hide
          Arun C Murthy added a comment -

          I just committed this. Thanks Bikas!

          Show
          Arun C Murthy added a comment - I just committed this. Thanks Bikas!
          Hide
          Ivan Mitic added a comment -

          +1, change looks good, thanks Bikas.

          Show
          Ivan Mitic added a comment - +1, change looks good, thanks Bikas.
          Hide
          Chuan Liu added a comment -

          +1 Thanks!

          Show
          Chuan Liu added a comment - +1 Thanks!
          Hide
          Bikas Saha added a comment -

          Attaching new patch with error mesg change and wide char usage.

          Show
          Bikas Saha added a comment - Attaching new patch with error mesg change and wide char usage.
          Hide
          Bikas Saha added a comment -

          ok. for this specific case I will just print an error message inline.

          Show
          Bikas Saha added a comment - ok. for this specific case I will just print an error message inline.
          Hide
          Ivan Mitic added a comment -

          Thanks for addressing the comments Bikas.

          2.For ReportErrorCode(), it is because we are calling FormatMessageW() inside the function to retrieval the system error code message inside the function. If you pass in EXIT_FAILURE (1), it will print error message of ERROR_INVALID_FUNCTION (1).

          Ah, the issue here is that CallNtPowerInformation returns NTSTATUS instead of WINERROR code. We would need to define a new ReportNtErrorCode() function that would be able to translate the NTSTATUS code into the message. The implementation would be similar to ReportErrorCode(), check out this MSDN link. I don't mind if you'd like to have a separate Jira on this.

          Show
          Ivan Mitic added a comment - Thanks for addressing the comments Bikas. 2.For ReportErrorCode(), it is because we are calling FormatMessageW() inside the function to retrieval the system error code message inside the function. If you pass in EXIT_FAILURE (1), it will print error message of ERROR_INVALID_FUNCTION (1). Ah, the issue here is that CallNtPowerInformation returns NTSTATUS instead of WINERROR code. We would need to define a new ReportNtErrorCode() function that would be able to translate the NTSTATUS code into the message. The implementation would be similar to ReportErrorCode() , check out this MSDN link . I don't mind if you'd like to have a separate Jira on this.
          Hide
          Chuan Liu added a comment -
          1. I would prefer the secure and generic versions unless you have a strong preference for the wide char version

            You can use fwprintf_s(). I am fine if you want to stay with tchar version. It is just implementation wise, tchar functions are always mapped to either Unicode/wchar version or ANSI version. E.g. we have the following definition in "tchar.h". Because we have explicitly indicate we only support Unicode. I think it more clear if we always use wide char version of the functions. Either way, since they are equivalent, it is your call.

            #define _ftprintf_s     fwprintf_s
            
          2. For ReportErrorCode(), it is because we are calling FormatMessageW() inside the function to retrieval the system error code message inside the function. If you pass in EXIT_FAILURE (1), it will print error message of ERROR_INVALID_FUNCTION (1).
          Show
          Chuan Liu added a comment - I would prefer the secure and generic versions unless you have a strong preference for the wide char version You can use fwprintf_s(). I am fine if you want to stay with tchar version. It is just implementation wise, tchar functions are always mapped to either Unicode/wchar version or ANSI version. E.g. we have the following definition in "tchar.h". Because we have explicitly indicate we only support Unicode. I think it more clear if we always use wide char version of the functions. Either way, since they are equivalent, it is your call. #define _ftprintf_s fwprintf_s For ReportErrorCode(), it is because we are calling FormatMessageW() inside the function to retrieval the system error code message inside the function. If you pass in EXIT_FAILURE (1), it will print error message of ERROR_INVALID_FUNCTION (1).
          Hide
          Bikas Saha added a comment -

          1) I would prefer the secure and generic versions unless you have a strong preference for the wide char version
          2) I re-used ReportErrorCode so that I dont have to guess where the error reporting is going - stderr or some log file. So I used the standard ReportErrorCode() function that takes care of it globally.

          Show
          Bikas Saha added a comment - 1) I would prefer the secure and generic versions unless you have a strong preference for the wide char version 2) I re-used ReportErrorCode so that I dont have to guess where the error reporting is going - stderr or some log file. So I used the standard ReportErrorCode() function that takes care of it globally.
          Hide
          Chuan Liu added a comment -

          Bikas Saha, change looks good! I just have following two minor suggestions for 'SystemInfo.c' based on branch-1-win.3.patch.

          1. Use fwprintf() instead of _ftprintf_s(). Because we have UNICODE defined, they should be the same. It is just fwprintf() makes things more clear in my opinion.
          2. ReportErrorCode() was designed to be used with system error codes. So it would be better to just print an custom error message instead of calling it with EXIT_FAILURE as follows.
            ReportErrorCode(L"CallNtPowerInformation", EXIT_FAILURE);
            
          Show
          Chuan Liu added a comment - Bikas Saha , change looks good! I just have following two minor suggestions for 'SystemInfo.c' based on branch-1-win.3.patch. Use fwprintf() instead of _ftprintf_s(). Because we have UNICODE defined, they should be the same. It is just fwprintf() makes things more clear in my opinion. ReportErrorCode() was designed to be used with system error codes. So it would be better to just print an custom error message instead of calling it with EXIT_FAILURE as follows. ReportErrorCode(L "CallNtPowerInformation" , EXIT_FAILURE);
          Hide
          Bikas Saha added a comment -

          Attaching patch after review comments.

          Show
          Bikas Saha added a comment - Attaching patch after review comments.
          Hide
          Bikas Saha added a comment -

          ProcfsBasedProcessTree uses procs.getValue().updateAge(oldInfo) to update the age. updateAge() explicitly sets age = oldAge+1. So there is no double increment.

          Ah.. sorry. I confused main() with Task(). There is no wrapper around SystemInfo() like Task(). Will add the error mesg.

          Show
          Bikas Saha added a comment - ProcfsBasedProcessTree uses procs.getValue().updateAge(oldInfo) to update the age. updateAge() explicitly sets age = oldAge+1. So there is no double increment. Ah.. sorry. I confused main() with Task(). There is no wrapper around SystemInfo() like Task(). Will add the error mesg.
          Hide
          Ivan Mitic added a comment -

          ProcessInfo.age initializes to 1. So newpInfo.age += oldpInfo.age does what you say apart from allowing newpInfo to have a different age too.

          I see. It seems that they are over-incrementing the value in ProcfsBasedProcessTree as the default value is also one.

          The main calling function will do that with the return code of this function.

          I could not find this...would you mind checking?

          Thanks!

          Show
          Ivan Mitic added a comment - ProcessInfo.age initializes to 1. So newpInfo.age += oldpInfo.age does what you say apart from allowing newpInfo to have a different age too. I see. It seems that they are over-incrementing the value in ProcfsBasedProcessTree as the default value is also one. The main calling function will do that with the return code of this function. I could not find this...would you mind checking? Thanks!
          Hide
          Bikas Saha added a comment -

          1) ProcessInfo.age initializes to 1. So newpInfo.age += oldpInfo.age does what you say apart from allowing newpInfo to have a different age too.
          2) In the real use case, setProcessPid() is called to set the pid. But there are a bunch of gridmix/ tests that depend on the old behavior of snooping the JVM_PID from the env var on the fly and I did not want to change all of them. So I added this to keep the tests passing without change.
          3) Forgot to add that. Will do. Thanks!
          4) Thats strange because I can see that the values are not zero and in fact similar to values shown in TaskManager. Will change to PrivateUsage anyways to be safe.
          5) Will do.
          6) The main calling function will do that with the return code of this function.

          Show
          Bikas Saha added a comment - 1) ProcessInfo.age initializes to 1. So newpInfo.age += oldpInfo.age does what you say apart from allowing newpInfo to have a different age too. 2) In the real use case, setProcessPid() is called to set the pid. But there are a bunch of gridmix/ tests that depend on the old behavior of snooping the JVM_PID from the env var on the fly and I did not want to change all of them. So I added this to keep the tests passing without change. 3) Forgot to add that. Will do. Thanks! 4) Thats strange because I can see that the values are not zero and in fact similar to values shown in TaskManager. Will change to PrivateUsage anyways to be safe. 5) Will do. 6) The main calling function will do that with the return code of this function.
          Hide
          Ivan Mitic added a comment -

          Change looks good Bikas!

          I have some minor comments/questions/suggestions:
          1. WindowsBasedProcessTree.java:114: Should the new age be oldAge+1 same as in ProfsBasedProcessTree?
          2. LinuxResourceCalculatorPlugin.java:399: Do we need this code:

                if(processPid == null) {
                  // process pid not set. try to obtain on our own
                  processPid = System.getenv().get("JVM_PID");
                }
          

          processPid should have been initialized. If not initialized in all cases, we might have a problem on Windows. To harden the code, might make sense to add a null check for processPid in WindowsResourceCalculatorPlugin#getProcResourceValues and log an error if processPid is undefined.
          3. Should ResourceCalculatorPlugin#getResourceCalculatorPlugin() fallback to WindowsResourceCalculatorPlugin on Windows?
          4. task.c:335 From documentation, it seems that PagefileUsage is always zero on Windows Server 2008, use PrivateUsage instead?
          5. main.c:100: Would you mind updating Usage() function with information on "systeminfo"? Please include information on the format of the output, as it’s hard to figure out. Same comment for task.c!TaskUsage()
          6. systeminfo.c: Can you please print error info via ReportErrorCode() when calls into Windows APIs fail?

          Show
          Ivan Mitic added a comment - Change looks good Bikas! I have some minor comments/questions/suggestions: 1. WindowsBasedProcessTree.java:114: Should the new age be oldAge+1 same as in ProfsBasedProcessTree? 2. LinuxResourceCalculatorPlugin.java:399: Do we need this code: if (processPid == null ) { // process pid not set. try to obtain on our own processPid = System .getenv().get( "JVM_PID" ); } processPid should have been initialized. If not initialized in all cases, we might have a problem on Windows. To harden the code, might make sense to add a null check for processPid in WindowsResourceCalculatorPlugin#getProcResourceValues and log an error if processPid is undefined. 3. Should ResourceCalculatorPlugin#getResourceCalculatorPlugin() fallback to WindowsResourceCalculatorPlugin on Windows? 4. task.c:335 From documentation, it seems that PagefileUsage is always zero on Windows Server 2008, use PrivateUsage instead? 5. main.c:100: Would you mind updating Usage() function with information on "systeminfo"? Please include information on the format of the output, as it’s hard to figure out. Same comment for task.c!TaskUsage() 6. systeminfo.c: Can you please print error info via ReportErrorCode() when calls into Windows APIs fail?
          Hide
          Bikas Saha added a comment -

          Fix some bugs in formatting.
          TestTaskTrackerMemoryManager now passes on Windows and tests the feature functionally.

          Show
          Bikas Saha added a comment - Fix some bugs in formatting. TestTaskTrackerMemoryManager now passes on Windows and tests the feature functionally.
          Hide
          Jonathan Eagles added a comment -

          Thanks, Bikas. Just trying to prevent Hadoop code from being contaminated with GPL or proprietary code licenses. Sounds like you are already controlling for that.

          Show
          Jonathan Eagles added a comment - Thanks, Bikas. Just trying to prevent Hadoop code from being contaminated with GPL or proprietary code licenses. Sounds like you are already controlling for that.
          Hide
          Bikas Saha added a comment -

          Sorry, I should have been more clear. winutils is a native executable that gets built on Windows. The source for that got added in HADOOP-8235. It provides key functionality like permissions, groups, hard links etc that are required to run Hadoop on Windows. This patch adds more support by adding functionality to winutils to monitor memory etc for tasks.

          Show
          Bikas Saha added a comment - Sorry, I should have been more clear. winutils is a native executable that gets built on Windows. The source for that got added in HADOOP-8235 . It provides key functionality like permissions, groups, hard links etc that are required to run Hadoop on Windows. This patch adds more support by adding functionality to winutils to monitor memory etc for tasks.
          Hide
          Jonathan Eagles added a comment -

          Bikas, when you say based on winutils, it makes me think you are basing this patch on some existing code. If so, can you comment on the source and licensing.

          Show
          Jonathan Eagles added a comment - Bikas, when you say based on winutils, it makes me think you are basing this patch on some existing code. If so, can you comment on the source and licensing.
          Hide
          Bikas Saha added a comment -

          Attaching patch based on winutils.

          Show
          Bikas Saha added a comment - Attaching patch based on winutils.
          Hide
          Mahadevan Venkatraman added a comment -

          This is great.
          Could you propagate the error code(s) from GetLastError() call for the printProcessTree() and printSystemInfo() in the error paths?

          Show
          Mahadevan Venkatraman added a comment - This is great. Could you propagate the error code(s) from GetLastError() call for the printProcessTree() and printSystemInfo() in the error paths?
          Hide
          Bikas Saha added a comment -

          Initial patch with refactorings/new Windows class/tests and native windows code to get resource information.

          Show
          Bikas Saha added a comment - Initial patch with refactorings/new Windows class/tests and native windows code to get resource information.

            People

            • Assignee:
              Bikas Saha
              Reporter:
              Bikas Saha
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development