Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2308

Include TPC data needed for testing

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: Impala 2.0
    • Fix Version/s: None
    • Component/s: Infrastructure
    • Labels:

      Description

      External contributors need the TPC data for testing but the data is hosted on an internal Cloudera server.

      On my desktop generating the data is slightly faster than downloading and extracting the data.

      [casey@casey-desktop dbgen]$ time (wget http://util-1.ent.cloudera.com/impala-test-data/tpch.tar.gz && tar xzf tpch.tar.gz)
      --2015-09-06 14:34:52--  http://util-1.ent.cloudera.com/impala-test-data/tpch.tar.gz
      Resolving util-1.ent.cloudera.com (util-1.ent.cloudera.com)... 10.17.181.200
      Connecting to util-1.ent.cloudera.com (util-1.ent.cloudera.com)|10.17.181.200|:80... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 313336602 (299M) [application/x-gzip]
      Saving to: ‘tpch.tar.gz’
      
      tpch.tar.gz                                                 100%[===========================================================================================================================================>] 298.82M  35.3MB/s   in 8.6s   
      
      2015-09-06 14:35:01 (34.8 MB/s) - ‘tpch.tar.gz’ saved [313336602/313336602]
      
      
      real	0m14.152s
      user	0m5.518s
      sys	0m1.744s
      
      [casey@casey-desktop dbgen]$ time (make clean && make && ./dbgen -f)
      rm -f dbgen qgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o 
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o build.o build.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o driver.o driver.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o bm_utils.o bm_utils.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o rnd.o rnd.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o print.o print.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o load_stub.o load_stub.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o bcd2.o bcd2.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o speed_seed.o speed_seed.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o text.o text.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o permute.o permute.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o rng64.o rng64.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o dbgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o qgen.o qgen.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o varsub.o varsub.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o qgen build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o -lm
      TPC-H Population Generator (Version 2.17.0)
      Copyright Transaction Processing Performance Council 1994 - 2010
      
      real	0m13.425s
      user	0m11.869s
      sys	0m0.654s
      

      I'm pretty sure the TPC license allows us to include dbgen in our repo. We'd never be shipping it anyways.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ishaan Ishaan Joshi
                Reporter:
                caseyc casey
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: