Pig
  1. Pig
  2. PIG-1102

Collect number of spills per job

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None

      Description

      Memory shortage is one of the main performance issues in Pig. Knowing when we spill do the disk is useful for understanding query performance and also to see how certain changes in Pig effect that.

      Other interesting stats to collect would be average CPU usage and max mem usage but I am not sure if this information is easily retrievable.

      Using Hadoop counters for this would make sense.

      1. PIG_1102.patch
        10 kB
        Sriranjan Manjunath
      2. PIG_1102.patch.1
        11 kB
        Sriranjan Manjunath

        Activity

        Hide
        Sriranjan Manjunath added a comment -

        Hadoop currently does not provide us average CPU usage / mem usage per job. It even does not provide the number of spills per job. I have created a jira requesting the same: https://issues.apache.org/jira/browse/MAPREDUCE-1257

        The only information we can currently gather is the number of spill records.

        Show
        Sriranjan Manjunath added a comment - Hadoop currently does not provide us average CPU usage / mem usage per job. It even does not provide the number of spills per job. I have created a jira requesting the same: https://issues.apache.org/jira/browse/MAPREDUCE-1257 The only information we can currently gather is the number of spill records.
        Hide
        Sriranjan Manjunath added a comment -

        There are no test cases included in the patch since it was difficult to consistently spill in a unit test case. I have manually tested the change. The easiest way to test this to load a huge data bag (1gb or so) and watch the map reduce UI. The UI will show new counters - SPILLABLE_MEMORY_MANAGER_SPILL_COUNT or PROACTIVE_SPILL_COUNT depending on the type of POPackage used.

        Show
        Sriranjan Manjunath added a comment - There are no test cases included in the patch since it was difficult to consistently spill in a unit test case. I have manually tested the change. The easiest way to test this to load a huge data bag (1gb or so) and watch the map reduce UI. The UI will show new counters - SPILLABLE_MEMORY_MANAGER_SPILL_COUNT or PROACTIVE_SPILL_COUNT depending on the type of POPackage used.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12428253/PIG_1102.patch
        against trunk revision 891499.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        -1 release audit. The applied patch generated 406 release audit warnings (more than the trunk's current 403 warnings).

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/134/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/134/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/134/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/134/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428253/PIG_1102.patch against trunk revision 891499. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 406 release audit warnings (more than the trunk's current 403 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/134/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/134/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/134/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/134/console This message is automatically generated.
        Hide
        Sriranjan Manjunath added a comment -

        Release audit warnings are related to html files.

        Show
        Sriranjan Manjunath added a comment - Release audit warnings are related to html files.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12428356/PIG_1102.patch
        against trunk revision 892125.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        -1 release audit. The applied patch generated 400 release audit warnings (more than the trunk's current 397 warnings).

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428356/PIG_1102.patch against trunk revision 892125. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 400 release audit warnings (more than the trunk's current 397 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/console This message is automatically generated.
        Hide
        Olga Natkovich added a comment -

        I will be reviewing this patch

        Show
        Olga Natkovich added a comment - I will be reviewing this patch
        Hide
        Sriranjan Manjunath added a comment -

        I ran the test again on my local machine, and it passes. The test failed because of "too many open file descriptors". Is this a hudson related issue?

        Show
        Sriranjan Manjunath added a comment - I ran the test again on my local machine, and it passes. The test failed because of "too many open file descriptors". Is this a hudson related issue?
        Hide
        Olga Natkovich added a comment -

        A few questions/comments on the patch:

        (1) I think the count should default to 0, not -1.
        (2) Does increment of count have to be combined with warn statement. Does this mean that users will see this many warnings? If so, should we combine this with spill message we already print?
        (3) I thought we discussed having increment per buffer not per record and to approximate that based on the buffer size. I did not see the code that did this.
        (4) I don't think you correctly separated bags that practively spill vs the bags that are spilled by memory manager. All the bags created by DefaultBagFactory get registerf with SpillableMemoryManager and belong to the second category.

        Show
        Olga Natkovich added a comment - A few questions/comments on the patch: (1) I think the count should default to 0, not -1. (2) Does increment of count have to be combined with warn statement. Does this mean that users will see this many warnings? If so, should we combine this with spill message we already print? (3) I thought we discussed having increment per buffer not per record and to approximate that based on the buffer size. I did not see the code that did this. (4) I don't think you correctly separated bags that practively spill vs the bags that are spilled by memory manager. All the bags created by DefaultBagFactory get registerf with SpillableMemoryManager and belong to the second category.
        Hide
        Sriranjan Manjunath added a comment -

        1. The default is -1 to distinguish it from the case where there is no spill. The value is set to -1, if counters could not be initialized which is an exception.
        2. warn is a misnomer. I reused an existing function which updates the counter if initialized. If the counter is not initialized, it dumps a warning.
        3,4. I have fixed these and submitted a new patch.

        Show
        Sriranjan Manjunath added a comment - 1. The default is -1 to distinguish it from the case where there is no spill. The value is set to -1, if counters could not be initialized which is an exception. 2. warn is a misnomer. I reused an existing function which updates the counter if initialized. If the counter is not initialized, it dumps a warning. 3,4. I have fixed these and submitted a new patch.
        Hide
        Olga Natkovich added a comment -

        I could not figure out how the patch addressed (3). Could you, please, explain.

        Show
        Olga Natkovich added a comment - I could not figure out how the patch addressed (3). Could you, please, explain.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12428760/PIG_1102.patch.1
        against trunk revision 893053.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        -1 release audit. The applied patch generated 411 release audit warnings (more than the trunk's current 408 warnings).

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/152/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/152/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/152/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/152/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428760/PIG_1102.patch.1 against trunk revision 893053. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 411 release audit warnings (more than the trunk's current 408 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/152/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/152/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/152/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/152/console This message is automatically generated.
        Hide
        Sriranjan Manjunath added a comment -

        (3) refers to the case where we try to guess the number of records that fit into memory and start spilling the other records. InternalCachedBag.java addresses this case:

        + if (cacheLimit!= 0 && mContents.size() % cacheLimit == 0)

        { + /* Increment the spill count*/ + incSpillCount(PigCounters.PROACTIVE_SPILL_COUNT); + }

        }

        cacheLimit holds the number of records that can be held in memory whereas mContents is the tuple that holds all the records. Here, I do not increment the counter for every record. Instead I count every n'th record, n being the cacheLimit.

        This however, does not increment the counter by the buffer size. Incrementing it by the buffer size will give us a value which approximately equal to the number of spilled records.

        Show
        Sriranjan Manjunath added a comment - (3) refers to the case where we try to guess the number of records that fit into memory and start spilling the other records. InternalCachedBag.java addresses this case: + if (cacheLimit!= 0 && mContents.size() % cacheLimit == 0) { + /* Increment the spill count*/ + incSpillCount(PigCounters.PROACTIVE_SPILL_COUNT); + } } cacheLimit holds the number of records that can be held in memory whereas mContents is the tuple that holds all the records. Here, I do not increment the counter for every record. Instead I count every n'th record, n being the cacheLimit. This however, does not increment the counter by the buffer size. Incrementing it by the buffer size will give us a value which approximately equal to the number of spilled records.
        Hide
        Olga Natkovich added a comment -

        patch committed. Thanks, Sri!

        Show
        Olga Natkovich added a comment - patch committed. Thanks, Sri!

          People

          • Assignee:
            Sriranjan Manjunath
            Reporter:
            Olga Natkovich
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development