Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
ghx-label-7
Description
Currently, writes to the data cache are synchronized with hdfs file reads, and both are handled by remote hdfs IO threads. In other words, if a cache miss occurs, the IO thread needs to take additional responsibility for cache writes, which will lead to query performance deterioration in some cases.
Therefore, the data cache should be able to defer the writes to another thread(or thread pool) which writes asynchronously, allowing the IO thread to copy the data into the temporary buffer and immediately return it to the Scanner. Also need to bound the extra memory consumption for holding the temporary buffer though.