Affects Version/s: 1.13.0
Fix Version/s: None
The Drill documentation provides the "Sample Datasets" section, which is very handy. However, this section does not discuss the two datasets provided with Drill itself.
- Julian Hyde's FoodMart data set, available on the class path.
- TPC-H data set, available on the class path in tpch
The "FoodMart" data set is available directly under cp. In fact, the Drill sample query (see below) references a FoodMart table. To see the list of tables (at development time), find the foodmark-data-json-0.4.jar file in the Maven dependencies for drill-java-exec. The table names here are simplified relative to those in the ER diagram in the above link. Perhaps include a simple table with names, and the mapping to the original names, and a link to (or just embed the link) to the FoodMart ER image. The data is available in JSON format.
TPCH data is available in `cp`.`tpch/*.parquet`, in Parquet format. The schema is described in the TPC-H specification.
These are very handy, but hard to find: I find I must keep searching the source code to remember file names and directory paths. End uses won't have this luxury.
Suggestion: Describe the files available in the class path data source.
Along these same lines, in "Connect a Data Source", there is no mention of the class path data source. Yet, we reference that data source in the Web Console where we suggest a sample query to run:
The above query refers to the FoodMart data set.