Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Tajo has many configurations. For the sake of user convenience, they all have some default value. Most of default values make sense, but some are not. Here is my idea.
- Tajo can choose hash-based algorithm and disk-based algorithm according to the remaining resource and query workload. Here, users can give a hint using session variables whose name is HASH_*_SIZE_LIMIT. Their current default value is 256 MB, but it's too large thereby incurring GC so often, and even OOM sometimes.
- Query cache should be disabled by default. It will not be commonly used.
- Configurations for broadcast join should accept numbers in MB.
- Some configurations allow different unit in tajo-site.xml and session variable. This must be consistent.