Thanks for the review.
Your renaming script will not work while committing too, because other classes still refer to them using the old names. You should generate a patch after renaming the classes yourselves. Classes that need to be renamed to reflect the word 'Resource' are: MemoryCalculator and all its sub-classes and TTMemoryReporting
I see. I will upload a patch that also renames the classes and methods. I will globally check 'Memory' and replace them with 'Resource' if it is necessary.
Further, MemoryCalculatorPlugin.java and the its subclasses are all public. So, you still should retain the old classes, deprecate them and redirect all the old functionality to the new Resource* classes.
I see. This way I don't have to rename so many things.
TaskTracker.java: Creation of the plugin should be taken out of initializeMemoryManagement() (+3471 through +3491) into initialize() method.
TTConfig.java: TT_MEMORY_CALCULATOR_PLUGIN needs to be renamed too.
Will do the necessary renaming.
TaskTrackerStatus.java: Why isn't cpu-usage serialized and de-serialized directly? Calculating it at the time of de-serialization from cumulativeCpuTime will yield wrong results, I think.
From /proc/stat, we can only get the cumulative time. So the CPU-usage has to be calculated by differencing. And we will always need to sample twice to be able to do the differencing. I do this in the receiving side because this can save a little bandwidth (one double). But you are right. This way the CPU-usage may be affected by the nonuniform delay caused by network transmission. I will move the calculation in the transmitter side and serialize it.
- JIFFY_LENGTH calculation code is duplicated from
MAPREDUCE-1201. Should that be a blocker for this one?
We should use ProcfsBasedProcessTree.JIFFY_LENGTH_IN_MILLIS here. For now I will keep the calculation code here just to make the code readable. After
MAPREDUCE-1201 is committed. I will fix this part.
- In the main method, I didn't understand why we should sleep for 500ms. What is the reason for doing this?
This is because we can only get the cumulative CPU time from /proc/stat. To compute the CPU usage, we need to compute the difference. So we delay a while and take another sample.
- This class has a lot of parsing and calculation code. Though I could verify them, it will be more helpful if we can write tests validating this. For this, we can write dummy /proc files ourselves and call the calculator class's methods similar to some of the TestProcfsBasedProcessTree tests, for e.g., testMemoryForProcessTree(). Can you add some similar tests here too, verifying each of the APIs?
I see. This kind of tests will definitely be helpful. I will work on it. Thanks.