Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-626

Statistics (records read by each mapper and reducer)

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.2.0
    • 0.3.0
    • impl
    • None
    • Patch Available

    Description

      This uses the counters framework that hadoop has. Initially, I am just interested in finding out the number of records read by each mapper/reducer particularly for the last job in any script. A sample code to access the statistics for the last job:

      String reducePlan = stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN");
      if(reducePlan == null)

      { System.out.println("Records written : " + stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS")); }

      else

      { System.out.println("Records written : " + stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS")); }

      The patch contains 7 test cases. These include tests PigStorage and BinStorage along with one for multiple MR jobs case.

      Attachments

        1. pigStats.patch
          53 kB
          Shubham Chopra
        2. pigStats.patch
          53 kB
          Shubham Chopra
        3. TEST-org.apache.pig.test.TestBZip.txt
          31 kB
          Alan Gates
        4. pigStats.patch
          70 kB
          Shubham Chopra
        5. pigStats.patch
          60 kB
          Shubham Chopra
        6. pigStats.patch
          64 kB
          Shubham Chopra
        7. PIG-626.patch
          64 kB
          Alan Gates

        Activity

          People

            shubhamc Shubham Chopra
            shubhamc Shubham Chopra
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: