Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
Impala 2.0
-
None
Description
External contributors need the TPC data for testing but the data is hosted on an internal Cloudera server.
On my desktop generating the data is slightly faster than downloading and extracting the data.
[casey@casey-desktop dbgen]$ time (wget http://util-1.ent.cloudera.com/impala-test-data/tpch.tar.gz && tar xzf tpch.tar.gz) --2015-09-06 14:34:52-- http://util-1.ent.cloudera.com/impala-test-data/tpch.tar.gz Resolving util-1.ent.cloudera.com (util-1.ent.cloudera.com)... 10.17.181.200 Connecting to util-1.ent.cloudera.com (util-1.ent.cloudera.com)|10.17.181.200|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 313336602 (299M) [application/x-gzip] Saving to: ‘tpch.tar.gz’ tpch.tar.gz 100%[===========================================================================================================================================>] 298.82M 35.3MB/s in 8.6s 2015-09-06 14:35:01 (34.8 MB/s) - ‘tpch.tar.gz’ saved [313336602/313336602] real 0m14.152s user 0m5.518s sys 0m1.744s
[casey@casey-desktop dbgen]$ time (make clean && make && ./dbgen -f) rm -f dbgen qgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o build.o build.c gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o driver.o driver.c gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bm_utils.o bm_utils.c gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rnd.o rnd.c gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o print.o print.c gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o load_stub.o load_stub.c gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bcd2.o bcd2.c gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o speed_seed.o speed_seed.c gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o text.o text.c gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o permute.o permute.c gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rng64.o rng64.c gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o dbgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o qgen.o qgen.c gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o varsub.o varsub.c gcc -O3 -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o qgen build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o -lm TPC-H Population Generator (Version 2.17.0) Copyright Transaction Processing Performance Council 1994 - 2010 real 0m13.425s user 0m11.869s sys 0m0.654s
I'm pretty sure the TPC license allows us to include dbgen in our repo. We'd never be shipping it anyways.
Attachments
Issue Links
- duplicates
-
IMPALA-3227 Ensure that tests and dataloading can be run efficiently w/o Cloudera infra
- Resolved