Tajo has many configurations. For the sake of user convenience, they all have some default value. Most of default values make sense, but some are not. Here is my idea.
- Tajo can choose hash-based algorithm and disk-based algorithm according to the remaining resource and query workload. Here, users can give a hint using session variables whose name is HASH_*_SIZE_LIMIT. Their current default value is 256 MB, but it's too large thereby incurring GC so often, and even OOM sometimes.
- Query cache should be disabled by default. It will not be commonly used.
- Configurations for broadcast join should accept numbers in MB.
- Some configurations allow different unit in tajo-site.xml and session variable. This must be consistent.