Pig
  1. Pig
  2. PIG-2855

Provide a method to measure time spent in UDFs

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Hide
      New Feature: Timing your UDFs

      The first step to improving performance and efficiency is measuring where the time is going. Pig provides a light-weight method for approximately measuring how much time is spent in different user-defined functions (UDFs) and Loaders. Simply set the pig.udf.profile property to true. This will cause new counters to be tracked for all Map-Reduce jobs generated by your script: approx_microsecs measures the approximate amount of time spent in a UDF, and approx_invocations measures the approximate number of times the UDF was invoked. Note that this may produce a large number of counters (two per UDF). Excessive amounts of counters can lead to poor JobTracker performance, so use this feature carefully, and preferably on a test cluster.
      Show
      New Feature: Timing your UDFs The first step to improving performance and efficiency is measuring where the time is going. Pig provides a light-weight method for approximately measuring how much time is spent in different user-defined functions (UDFs) and Loaders. Simply set the pig.udf.profile property to true. This will cause new counters to be tracked for all Map-Reduce jobs generated by your script: approx_microsecs measures the approximate amount of time spent in a UDF, and approx_invocations measures the approximate number of times the UDF was invoked. Note that this may produce a large number of counters (two per UDF). Excessive amounts of counters can lead to poor JobTracker performance, so use this feature carefully, and preferably on a test cluster.

      Description

      When debugging slow jobs, it is often useful to know whether time is being spent in UDFs, and in which UDFs. This is easy to measure from within the framework, we should let users optionally track these metrics.

      1. PIG-2855.patch
        9 kB
        Dmitriy V. Ryaboy
      2. PIG-2855.2.patch
        9 kB
        Dmitriy V. Ryaboy

        Activity

        Dmitriy V. Ryaboy created issue -
        Dmitriy V. Ryaboy made changes -
        Field Original Value New Value
        Attachment PIG-2855.patch [ 12538867 ]
        Dmitriy V. Ryaboy made changes -
        Assignee Dmitriy V. Ryaboy [ dvryaboy ]
        Dmitriy V. Ryaboy made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Dmitriy V. Ryaboy made changes -
        Attachment PIG-2855.2.patch [ 12538873 ]
        Dmitriy V. Ryaboy made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 0.11 [ 12318878 ]
        Resolution Fixed [ 1 ]
        Dmitriy V. Ryaboy made changes -
        Release Note New Feature: Timing your UDFs

        The first step to improving performance and efficiency is measuring where the time is going. Pig provides a light-weight method for approximately measuring how much time is spent in different user-defined functions (UDFs) and Loaders. Simply set the pig.udf.profile property to true. This will cause new counters to be tracked for all Map-Reduce jobs generated by your script: approx_microsecs measures the approximate amount of time spent in a UDF, and approx_invocations measures the approximate number of times the UDF was invoked. Note that this may produce a large number of counters (two per UDF). Excessive amounts of counters can lead to poor JobTracker performance, so use this feature carefully, and preferably on a test cluster.
        Bill Graham made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Dmitriy V. Ryaboy
            Reporter:
            Dmitriy V. Ryaboy
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development