We need a better input sampling to serve at least two purposes:
1. test their queries against a smaller data set
2. understand more about how the data look like without scanning the whole table.
A simple function that gives a subset splits will help in those cases. It doesn't have to be strict sampling.