Following on from
IMPALA-8121, I don't think we can enable the data cache by default, since it depends on what volumes are available to the container at runtime. But we should definitely enable it for tests.
Michael Ho said
When I tested with the data cache enabled in a mini-cluster with 3 node using the default scale of workload, I ran with 500 MB with 1 partition by running
start-impala-cluster.py --data_cache_dir=/tmp --data_cache_size=500MB
You can also a pre-existing directory as the startup flag of Impala like
start-impala-cluster.py already mounts some host directories into the container, so we could either do the same for the data cache, or just depend on the container root filesystem (which is likely to be slow, unfortunately).