Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-200

Pig Performance Benchmarks

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.2.0
    • None
    • None

    Description

      To benchmark Pig performance, we need to have a TPC-H like Large Data Set plus Script Collection. This is used in comparison of different Pig releases, Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only).

      Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance

      I am currently running long-running Pig scripts over data-sets in the order of tens of TBs. Next step is hundreds of TBs.

      We need to have an open large-data set (open source scripts which generate data-set) and detailed scripts for important operations such as ORDER, AGGREGATION etc.

      We can call those the Pig Workouts: Cardio (short processing), Marathon (long running scripts) and Triathlon (Mix).

      I will update this JIRA with more details of current activities soon.

      Attachments

        1. generate_data.pl
          10 kB
          Alan Gates
        2. perf.hadoop.patch
          33 kB
          Ying He
        3. perf.patch
          153 kB
          Alan Gates
        4. perf-0.6.patch
          152 kB
          Daniel Dai
        5. pig-0.8.1-vs-0.9.0.png
          8 kB
          Jie Li
        6. PIG-200-0.12.patch
          218 kB
          Daniel Dai
        7. pigmix_pig0.11.patch
          194 kB
          Dmitriy V. Ryaboy
        8. pigmix2.patch
          200 kB
          Daniel Dai

        Issue Links

          Activity

            People

              gates Alan Gates
              amirhyoussefi Amir Youssefi
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: