Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2308

Include TPC data needed for testing

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • Impala 2.0
    • None
    • Infrastructure

    Description

      External contributors need the TPC data for testing but the data is hosted on an internal Cloudera server.

      On my desktop generating the data is slightly faster than downloading and extracting the data.

      [casey@casey-desktop dbgen]$ time (wget http://util-1.ent.cloudera.com/impala-test-data/tpch.tar.gz && tar xzf tpch.tar.gz)
      --2015-09-06 14:34:52--  http://util-1.ent.cloudera.com/impala-test-data/tpch.tar.gz
      Resolving util-1.ent.cloudera.com (util-1.ent.cloudera.com)... 10.17.181.200
      Connecting to util-1.ent.cloudera.com (util-1.ent.cloudera.com)|10.17.181.200|:80... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 313336602 (299M) [application/x-gzip]
      Saving to: ‘tpch.tar.gz’
      
      tpch.tar.gz                                                 100%[===========================================================================================================================================>] 298.82M  35.3MB/s   in 8.6s   
      
      2015-09-06 14:35:01 (34.8 MB/s) - ‘tpch.tar.gz’ saved [313336602/313336602]
      
      
      real	0m14.152s
      user	0m5.518s
      sys	0m1.744s
      
      [casey@casey-desktop dbgen]$ time (make clean && make && ./dbgen -f)
      rm -f dbgen qgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o 
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o build.o build.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o driver.o driver.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o bm_utils.o bm_utils.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o rnd.o rnd.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o print.o print.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o load_stub.o load_stub.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o bcd2.o bcd2.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o speed_seed.o speed_seed.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o text.o text.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o permute.o permute.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o rng64.o rng64.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o dbgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o qgen.o qgen.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64   -c -o varsub.o varsub.c
      gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o qgen build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o -lm
      TPC-H Population Generator (Version 2.17.0)
      Copyright Transaction Processing Performance Council 1994 - 2010
      
      real	0m13.425s
      user	0m11.869s
      sys	0m0.654s
      

      I'm pretty sure the TPC license allows us to include dbgen in our repo. We'd never be shipping it anyways.

      Attachments

        Issue Links

          Activity

            People

              ishaan Ishaan Joshi
              caseyc casey
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: