Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
ghx-label-7
Description
Following on from IMPALA-8121, I don't think we can enable the data cache by default, since it depends on what volumes are available to the container at runtime. But we should definitely enable it for tests.
kwho said
When I tested with the data cache enabled in a mini-cluster with 3 node using the default scale of workload, I ran with 500 MB with 1 partition by running
start-impala-cluster.py --data_cache_dir=/tmp --data_cache_size=500MB
You can also a pre-existing directory as the startup flag of Impala like
--data_cache=/tmp/data-cache-0:500MB
start-impala-cluster.py already mounts some host directories into the container, so we could either do the same for the data cache, or just depend on the container root filesystem (which is likely to be slow, unfortunately).