Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.0
    • Component/s: None
    • Labels:
      None

      Description

      To benchmark Pig performance, we need to have a TPC-H like Large Data Set plus Script Collection. This is used in comparison of different Pig releases, Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only).

      Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance

      I am currently running long-running Pig scripts over data-sets in the order of tens of TBs. Next step is hundreds of TBs.

      We need to have an open large-data set (open source scripts which generate data-set) and detailed scripts for important operations such as ORDER, AGGREGATION etc.

      We can call those the Pig Workouts: Cardio (short processing), Marathon (long running scripts) and Triathlon (Mix).

      I will update this JIRA with more details of current activities soon.

      1. generate_data.pl
        10 kB
        Alan Gates
      2. perf.patch
        153 kB
        Alan Gates
      3. perf.hadoop.patch
        33 kB
        Ying He
      4. perf-0.6.patch
        152 kB
        Daniel Dai
      5. pigmix2.patch
        200 kB
        Daniel Dai
      6. pigmix_pig0.11.patch
        194 kB
        Dmitriy V. Ryaboy
      7. pig-0.8.1-vs-0.9.0.png
        8 kB
        Jie Li
      8. PIG-200-0.12.patch
        218 kB
        Daniel Dai

        Issue Links

          Activity

          Hide
          Daniel Dai added a comment -

          Professor Roger Whitney is kindly agree to change license of sdsuLibJKD12 to Apache. Commit it to Pig codebase. Get rid of the manual step. Thanks Professor Roger!

          Show
          Daniel Dai added a comment - Professor Roger Whitney is kindly agree to change license of sdsuLibJKD12 to Apache. Commit it to Pig codebase. Get rid of the manual step. Thanks Professor Roger!
          Hide
          Annie Lin added a comment -

          Hi Danie, it seems that test/perf/pigmix/lib/sdsuLibJKD12.jar is missing in trunk, can you check it in?

          Show
          Annie Lin added a comment - Hi Danie, it seems that test/perf/pigmix/lib/sdsuLibJKD12.jar is missing in trunk, can you check it in?
          Hide
          Daniel Dai added a comment -
          Show
          Daniel Dai added a comment - Documentation is in https://cwiki.apache.org/confluence/display/PIG/PigMix
          Hide
          Amir Youssefi added a comment -

          Thanks Daniel. I second Alan's "I think it would be good to get this checked in and maintained going forward."

          Show
          Amir Youssefi added a comment - Thanks Daniel. I second Alan's "I think it would be good to get this checked in and maintained going forward."
          Hide
          Daniel Dai added a comment -

          PIG-200-0.12.patch committed to trunk.

          Show
          Daniel Dai added a comment - PIG-200 -0.12.patch committed to trunk.
          Hide
          Alan Gates added a comment -

          +1. Latest patch changes look good. I think it would be good to get this checked in and maintained going forward.

          Show
          Alan Gates added a comment - +1. Latest patch changes look good. I think it would be good to get this checked in and maintained going forward.
          Hide
          Daniel Dai added a comment -

          To run it:
          ant -Dharness.hadoop.home=$HADOOP_HOME pigmix-deploy
          ant -Dharness.hadoop.home=$HADOOP_HOME pigmix

          Show
          Daniel Dai added a comment - To run it: ant -Dharness.hadoop.home=$HADOOP_HOME pigmix-deploy ant -Dharness.hadoop.home=$HADOOP_HOME pigmix
          Hide
          Daniel Dai added a comment -

          PIG-200-0.12.patch works with trunk (0.12). I tested both hadoop 1.0.4 and 2.0.3. The patch is ready to review and commit.

          Show
          Daniel Dai added a comment - PIG-200 -0.12.patch works with trunk (0.12). I tested both hadoop 1.0.4 and 2.0.3. The patch is ready to review and commit.
          Hide
          Daniel Dai added a comment -

          Hi, Shaw, it cannot run directly with pig-0.11 yet. I will attach another patch shortly.

          Show
          Daniel Dai added a comment - Hi, Shaw, it cannot run directly with pig-0.11 yet. I will attach another patch shortly.
          Hide
          shaw chopon added a comment -

          Does anyone run pigmix succeed on pig-0.11 and hadoop-2.0.x ?

          Show
          shaw chopon added a comment - Does anyone run pigmix succeed on pig-0.11 and hadoop-2.0.x ?
          Hide
          Jie Li added a comment -

          Attached the complete results in case anybody is interested.

          Show
          Jie Li added a comment - Attached the complete results in case anybody is interested.
          Hide
          Jie Li added a comment -

          Agree! Created a jira PIG-2661

          Show
          Jie Li added a comment - Agree! Created a jira PIG-2661
          Hide
          Dmitriy V. Ryaboy added a comment -

          Funny, was just looking at the same thing. It's pig being silly about what projections do – try loading without a schema, and ordering by the field ordinal. The extra MR job will go away.

          Seems like a serious performance regression that we should fix asap.

          Show
          Dmitriy V. Ryaboy added a comment - Funny, was just looking at the same thing. It's pig being silly about what projections do – try loading without a schema, and ordering by the field ordinal. The extra MR job will go away. Seems like a serious performance regression that we should fix asap.
          Hide
          Jie Li added a comment -

          Did anyone notice that pig-0.9.0 uses an extra job for L9 than pig-0.8.1?

          L9 is very simple: (slightly changed for setting the default_parallel)

          SET default_parallel $factor
          
          register ../pigperf.jar;
          A = load '$input/pigmix_page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
              as (user, action, timespent, query_term, ip_addr, timestamp,
                  estimated_revenue, page_info, page_links);
          B = order A by query_term;
          store B into '$output/L9out';
          
          Output information of 0.8.1

          JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs
          job_201204222028_0192 60 1 33 12 20 102 102 102 B SAMPLER
          job_201204222028_0193 60 17 78 39 57 533 147 360 B ORDER_BY /tmp/10m-0.8.1/L9out,

          Output information of 0.9.0

          JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias
          Feature Outputs
          job_201204222028_0269 60 0 171 27 116 0 0 0 A MAP_ONLY
          job_201204222028_0270 60 1 63 9 26 136 136 136 B SAMPLER
          job_201204222028_0271 60 17 183 30 66 657 262 446 B ORDER_BY /tmp/10m-0.9.0/L9out,

          We can see, 0.9.0 uses a MAP_ONLY job to load the data, which is almost as expensive as the ORDER_BY job. In my environment with 4 slave nodes processing 10m records, it increases time from 1021 seconds (0.8.1) to 1921 seconds (0.9.0)!

          Does anybody know what happened?

          Show
          Jie Li added a comment - Did anyone notice that pig-0.9.0 uses an extra job for L9 than pig-0.8.1? L9 is very simple: (slightly changed for setting the default_parallel) SET default_parallel $factor register ../pigperf.jar; A = load '$input/pigmix_page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links); B = order A by query_term; store B into '$output/L9out'; Output information of 0.8.1 JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201204222028_0192 60 1 33 12 20 102 102 102 B SAMPLER job_201204222028_0193 60 17 78 39 57 533 147 360 B ORDER_BY /tmp/10m-0.8.1/L9out, Output information of 0.9.0 JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201204222028_0269 60 0 171 27 116 0 0 0 A MAP_ONLY job_201204222028_0270 60 1 63 9 26 136 136 136 B SAMPLER job_201204222028_0271 60 17 183 30 66 657 262 446 B ORDER_BY /tmp/10m-0.9.0/L9out, We can see, 0.9.0 uses a MAP_ONLY job to load the data, which is almost as expensive as the ORDER_BY job. In my environment with 4 slave nodes processing 10m records, it increases time from 1021 seconds (0.8.1) to 1921 seconds (0.9.0)! Does anybody know what happened?
          Hide
          Jie Li added a comment -

          "To benchmark Pig performance, we need to have a TPC-H like Large Data Set plus Script Collection."

          Now we have the TPC-H benchmark available for Pig (PIG-2397)!! It can be used to measure the performance of Pig's relational operators, like projection, group, join, etc. Also, the complex workloads can help enrich Pig's multi-query optimization. More importantly, we can use it to compare Pig with Hive and many other systems to see what we can learn from them, and get a more intuitive understanding of Pig's performance.

          So I propose to include the TPC-H as part of the PigMix.

          Show
          Jie Li added a comment - "To benchmark Pig performance, we need to have a TPC-H like Large Data Set plus Script Collection." Now we have the TPC-H benchmark available for Pig ( PIG-2397 )!! It can be used to measure the performance of Pig's relational operators, like projection, group, join, etc. Also, the complex workloads can help enrich Pig's multi-query optimization. More importantly, we can use it to compare Pig with Hive and many other systems to see what we can learn from them, and get a more intuitive understanding of Pig's performance. So I propose to include the TPC-H as part of the PigMix.
          Hide
          Dmitriy V. Ryaboy added a comment -

          btw getting rid of the hardcoded "parallel 40" is a suggestion, feel free to push back. I haven't looked at what this does to multi-terabyte-sized loads. I do think we should consider measuring sum of task times instead of overall wall clock time – it's a better measure of performance.

          Show
          Dmitriy V. Ryaboy added a comment - btw getting rid of the hardcoded "parallel 40" is a suggestion, feel free to push back. I haven't looked at what this does to multi-terabyte-sized loads. I do think we should consider measuring sum of task times instead of overall wall clock time – it's a better measure of performance.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Attaching a patch that works with pig 0.11 (current trunk).

          A few changes:

          1) Removed explicit hardcoded parallelism inside the pig scripts to let them scale automatically.

          2) All scripts respect the PIGMIX_DIR environment variable (so you don't have to have /user/pig on your cluster – set $PIGMIX_DIR and use your own paths)

          3) Made the shell scripts respect $HADOOP_CLASSPATH

          4) PigPerformanceLoader is now part of e2e, so removed it from this patch. Fixed the e2e one to proxy bytesToMap to its caster instead of throwing an exception

          5) use /usr/bin/env to find perl, as not everyone has it in the same place. Use strict and warnings in perl.

          Show
          Dmitriy V. Ryaboy added a comment - Attaching a patch that works with pig 0.11 (current trunk). A few changes: 1) Removed explicit hardcoded parallelism inside the pig scripts to let them scale automatically. 2) All scripts respect the PIGMIX_DIR environment variable (so you don't have to have /user/pig on your cluster – set $PIGMIX_DIR and use your own paths) 3) Made the shell scripts respect $HADOOP_CLASSPATH 4) PigPerformanceLoader is now part of e2e, so removed it from this patch. Fixed the e2e one to proxy bytesToMap to its caster instead of throwing an exception 5) use /usr/bin/env to find perl, as not everyone has it in the same place. Use strict and warnings in perl.
          Hide
          Daniel Dai added a comment -

          above Daniel says that you only need to resize "pages" in order to change the other datasets. But, the script has a variable defined for "widerowcnt" which implies otherwise. Do we need to manually adjust that or is it ignored? If ignored it should be set to 0 with a comment explaining that

          widerows is one exception you also need to specify the size. This is only used in L11. It should increase proportionally with the pageviews.

          Show
          Daniel Dai added a comment - above Daniel says that you only need to resize "pages" in order to change the other datasets. But, the script has a variable defined for "widerowcnt" which implies otherwise. Do we need to manually adjust that or is it ignored? If ignored it should be set to 0 with a comment explaining that widerows is one exception you also need to specify the size. This is only used in L11. It should increase proportionally with the pageviews.
          Hide
          Todd Lipcon added a comment -

          To run in 0.9, I had to make some changes:

          • the new signature of bytesToMap in PigPerformanceLoader.Caster has to delegate to the old implementation (not through UnsupportedOperationException)
          • the loader script should really have a "set -e", and ideally figure out which generation steps have already been completed successfully (mine failed several hours in due to OOME trying to load a giant lookaside map)
          • above Daniel says that you only need to resize "pages" in order to change the other datasets. But, the script has a variable defined for "widerowcnt" which implies otherwise. Do we need to manually adjust that or is it ignored? If ignored it should be set to 0 with a comment explaining that
          • cat $powerusers/* will try to cat a _logs directory - should probably only try to cat the data files
          Show
          Todd Lipcon added a comment - To run in 0.9, I had to make some changes: the new signature of bytesToMap in PigPerformanceLoader.Caster has to delegate to the old implementation (not through UnsupportedOperationException) the loader script should really have a "set -e", and ideally figure out which generation steps have already been completed successfully (mine failed several hours in due to OOME trying to load a giant lookaside map) above Daniel says that you only need to resize "pages" in order to change the other datasets. But, the script has a variable defined for "widerowcnt" which implies otherwise. Do we need to manually adjust that or is it ignored? If ignored it should be set to 0 with a comment explaining that cat $powerusers/* will try to cat a _logs directory - should probably only try to cat the data files
          Hide
          Amr Awadallah added a comment -

          I am out of office on a business trip for next couple of days. I will
          be slower than usual in responding to emails. If this is urgent then
          please call my cell phone (or send an SMS), otherwise I will reply to
          your email when I get back.

          Thanks for your patience,

          – amr

          Show
          Amr Awadallah added a comment - I am out of office on a business trip for next couple of days. I will be slower than usual in responding to emails. If this is urgent then please call my cell phone (or send an SMS), otherwise I will reply to your email when I get back. Thanks for your patience, – amr
          Hide
          Daniel Dai added a comment -

          All other table will derive data from pages. If you resize pages, other table will change automatically to maintain the proper ratio.

          Show
          Daniel Dai added a comment - All other table will derive data from pages. If you resize pages, other table will change automatically to maintain the proper ratio.
          Hide
          Mostafa Ead added a comment -

          Hello Daniel,

          I am using pigmix2.patch now. It generates 625m records for the pages table, which is too large compared to the available disk space on my cluster. I would like to generate only 100m records of pages. Is there a ratio I should maintain between the size of the pages table and the other tables; users and power users?

          Thanks.
          Mostafa Ead

          Show
          Mostafa Ead added a comment - Hello Daniel, I am using pigmix2.patch now. It generates 625m records for the pages table, which is too large compared to the available disk space on my cluster. I would like to generate only 100m records of pages. Is there a ratio I should maintain between the size of the pages table and the other tables; users and power users? Thanks. Mostafa Ead
          Hide
          Daniel Dai added a comment -

          pigmix2.patch include all queries for PigMix2. It contains original 12 PigMix queries plus 5 new queries. New queries is to measure the performance of new Pig features. This patch also contains map-reduce version of data-generator. To use it:
          1. Download pig 0.7 release
          2. Apply the patch
          3. copy http://www.eli.sdsu.edu/java-SDSU/sdsuLibJKD12.jar to lib
          4. ant jar pigperf
          5. You will use pig.jar, pigperf.jar. Scripts is in test/utils/pigmix/scripts. To generate data, use generate_data.sh. To run PigMix2, use runpigmix-adhoc.pl.

          Show
          Daniel Dai added a comment - pigmix2.patch include all queries for PigMix2. It contains original 12 PigMix queries plus 5 new queries. New queries is to measure the performance of new Pig features. This patch also contains map-reduce version of data-generator. To use it: 1. Download pig 0.7 release 2. Apply the patch 3. copy http://www.eli.sdsu.edu/java-SDSU/sdsuLibJKD12.jar to lib 4. ant jar pigperf 5. You will use pig.jar, pigperf.jar. Scripts is in test/utils/pigmix/scripts. To generate data, use generate_data.sh. To run PigMix2, use runpigmix-adhoc.pl.
          Hide
          Daniel Dai added a comment -

          Hi, duncan,
          I tried and I didn't see errors. Are you using pig 0.6 release? What error message did you see?

          Show
          Daniel Dai added a comment - Hi, duncan, I tried and I didn't see errors. Are you using pig 0.6 release? What error message did you see?
          Hide
          duncan added a comment -

          Hi ,

          I encountered build fail when I executed this command "ant jar compile-test".
          What do I need to installed before I execute this command?
          Thanks

          Duncan

          Show
          duncan added a comment - Hi , I encountered build fail when I executed this command "ant jar compile-test". What do I need to installed before I execute this command? Thanks Duncan
          Hide
          duncan added a comment -

          Thank you very Daniel~

          Show
          duncan added a comment - Thank you very Daniel~
          Hide
          Daniel Dai added a comment -

          Miss one step, before you build, get http://www.eli.sdsu.edu/java-SDSU/sdsuLibJKD12.jar, and put in your lib

          Show
          Daniel Dai added a comment - Miss one step, before you build, get http://www.eli.sdsu.edu/java-SDSU/sdsuLibJKD12.jar , and put in your lib
          Hide
          Daniel Dai added a comment -

          Hi, Duncan,
          perf.patch is a little bit old. I attach new perf-0.6.patch. Instruction to generate input data for Pigmix is:
          1. apply perf-0.6.patch on pig 0.6 release
          2. ant jar compile-test
          3. export PIG_HOME=.
          4. test/utils/pigmix/datagen/generate_data.sh

          Show
          Daniel Dai added a comment - Hi, Duncan, perf.patch is a little bit old. I attach new perf-0.6.patch. Instruction to generate input data for Pigmix is: 1. apply perf-0.6.patch on pig 0.6 release 2. ant jar compile-test 3. export PIG_HOME=. 4. test/utils/pigmix/datagen/generate_data.sh
          Hide
          duncan added a comment -

          Hi Daniel,

          How can I run the perf.patch? I saw a lot of different things in the perf.patch.
          I want to generate data set and use those 14 pig queries for benchmarking.

          Would you mind telling me more on how to use the perf.patch?

          Thanks

          Duncan

          Show
          duncan added a comment - Hi Daniel, How can I run the perf.patch? I saw a lot of different things in the perf.patch. I want to generate data set and use those 14 pig queries for benchmarking. Would you mind telling me more on how to use the perf.patch? Thanks Duncan
          Hide
          Daniel Dai added a comment -

          Yes, as the name suggests, generate_data.sh will generate the input file for the queries.

          Show
          Daniel Dai added a comment - Yes, as the name suggests, generate_data.sh will generate the input file for the queries.
          Hide
          duncan added a comment -

          How can I run the perf.patch? do I need the generate_data.sh in order to run those 14 queries?

          Show
          duncan added a comment - How can I run the perf.patch? do I need the generate_data.sh in order to run those 14 queries?
          Hide
          Ying He added a comment -

          doc for DataGenerator in hadoop mode is here: http://wiki.apache.org/pig/DataGeneratorHadoop

          Show
          Ying He added a comment - doc for DataGenerator in hadoop mode is here: http://wiki.apache.org/pig/DataGeneratorHadoop
          Hide
          Ying He added a comment - - edited

          perf.hadoop.patch is used to support running DataGenerator in hadoop mode. It should be installed on top of perf.patch.

          Show
          Ying He added a comment - - edited perf.hadoop.patch is used to support running DataGenerator in hadoop mode. It should be installed on top of perf.patch.
          Hide
          Zheng Shao added a comment -

          We made a benchmark for Hive based on the queries from the SIGMOD 2009 paper.
          https://issues.apache.org/jira/browse/HIVE-396

          We also spent a lot of time in writing pig programs for those queries, and we have some preliminary results.
          Will somebody from the pig team take a look and help improve the pig queries?

          Show
          Zheng Shao added a comment - We made a benchmark for Hive based on the queries from the SIGMOD 2009 paper. https://issues.apache.org/jira/browse/HIVE-396 We also spent a lot of time in writing pig programs for those queries, and we have some preliminary results. Will somebody from the pig team take a look and help improve the pig queries?
          Hide
          Olga Natkovich added a comment -

          PigMix is out set of benchmarks going forward.

          Show
          Olga Natkovich added a comment - PigMix is out set of benchmarks going forward.
          Hide
          Alan Gates added a comment -

          The following attached patch takes a different approach to providing a set of benchmarks for pig. It contains a set of 14 queries which are designed to try to cover a range of ways users use pig. It also includes implementations of the same queries in java code for map reduce, so that developers can compare pig performance against map reduce performance. See http://wiki.apache.org/pig/PigMix for information on how the queries were chosen, how the data is constructed, and data from an initial run of 0.1.0 pig versus soon to be 0.2.0 pig.

          This attachment is not ready for inclusion in the code. It has several issues.

          1. The library used to generate the zipf distributions in the data is under the GNU public license, and thus cannot be included. The library can be obtained at http://www.eli.sdsu.edu/java-SDSU/
          2. The data generation script is single threaded because the zipf distribution generator is. This means to generate 10m rows of data (about 15G) takes ~48 hours. I'd like to be able to generate larger data sets, but first I need to find a parallel zipf distribution generator that has a compatible license (or write one, which I don't really want to do).
          3. There are places in the code (particularly the map reduce code) where path names etc. are hard wired to locations in my test setup. These need to be generalized.
          Show
          Alan Gates added a comment - The following attached patch takes a different approach to providing a set of benchmarks for pig. It contains a set of 14 queries which are designed to try to cover a range of ways users use pig. It also includes implementations of the same queries in java code for map reduce, so that developers can compare pig performance against map reduce performance. See http://wiki.apache.org/pig/PigMix for information on how the queries were chosen, how the data is constructed, and data from an initial run of 0.1.0 pig versus soon to be 0.2.0 pig. This attachment is not ready for inclusion in the code. It has several issues. The library used to generate the zipf distributions in the data is under the GNU public license, and thus cannot be included. The library can be obtained at http://www.eli.sdsu.edu/java-SDSU/ The data generation script is single threaded because the zipf distribution generator is. This means to generate 10m rows of data (about 15G) takes ~48 hours. I'd like to be able to generate larger data sets, but first I need to find a parallel zipf distribution generator that has a compatible license (or write one, which I don't really want to do). There are places in the code (particularly the map reduce code) where path names etc. are hard wired to locations in my test setup. These need to be generalized.
          Hide
          Alan Gates added a comment -

          Pi, here's the perl script I use to generate data for end-to-end testing, large and small.

          Show
          Alan Gates added a comment - Pi, here's the perl script I use to generate data for end-to-end testing, large and small.
          Hide
          Pi Song added a comment -

          To move this forward, I propose that we should have:-

          1) Small dataset test. Just a set of simple benchmarks for running on development box. For a common machine, it shouldn't take longer than 30 minutes in total.
          2) Large dataset test. For real benchmarking.

          I think the test cases in http://wiki.apache.org/pig/PigPerformance should already provide good coverage. We will have to do another review once query optimizer is in place.

          Alan, could you please attach the perl script so that I can see what it does?

          Show
          Pi Song added a comment - To move this forward, I propose that we should have:- 1) Small dataset test. Just a set of simple benchmarks for running on development box. For a common machine, it shouldn't take longer than 30 minutes in total. 2) Large dataset test. For real benchmarking. I think the test cases in http://wiki.apache.org/pig/PigPerformance should already provide good coverage. We will have to do another review once query optimizer is in place. Alan, could you please attach the perl script so that I can see what it does?
          Hide
          Arun C Murthy added a comment -

          I have a perl script that I use to generate data for pig testing. [...]

          I managed to use the same script via a Pig-Streaming job to generate large amounts of data in a parallel manner, the only change I needed was to edit it to switch off the sql generation...

          Show
          Arun C Murthy added a comment - I have a perl script that I use to generate data for pig testing. [...] I managed to use the same script via a Pig-Streaming job to generate large amounts of data in a parallel manner, the only change I needed was to edit it to switch off the sql generation...
          Hide
          Alan Gates added a comment -

          Amir,

          I have a perl script that I use to generate data for pig testing. It can generate different types of data and different sizes. For a given type and size, it always produces the same set. However, for data of the same type and different sizes, the smaller is not a subset of the larger. This allows it to produce good data for testing joins. If you're interested in using this for your benchmarking, let me know.

          Show
          Alan Gates added a comment - Amir, I have a perl script that I use to generate data for pig testing. It can generate different types of data and different sizes. For a given type and size, it always produces the same set. However, for data of the same type and different sizes, the smaller is not a subset of the larger. This allows it to produce good data for testing joins. If you're interested in using this for your benchmarking, let me know.

            People

            • Assignee:
              Alan Gates
              Reporter:
              Amir Youssefi
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development