• Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • llap
    • None
    • None


      There are two distinct options for caching encoded data in row-columnar format - caching logical chunks (e.g. for ORC stripe x column, or rg x column), or caching physical chunks (e.g. for ORC, compression buffers, entire stripes, ...). For highly selective queries, the former will probably result in better cache utilization and less undesirable priority phenomena. It will also be easier to use for different formats.
      However, given that logical chunks are variable-sized, it's harder to implement. Prototype has a form of cache like that, but it has some serious shortcomings in its current form. Additionally, high-level cache will operate above ACID logic in file format and would thus require cache invalidation, which is as we know one of the only hard things in CS.
      Low level cache for ORC case, however, is easier to implement due to nearly fixed uncompressed size of compression buffers; these, at 256k default, are also sufficiently granular. While not having the benefit of having ACID delta-s already merged like a high-level cache would have, it will work with ACID out of the box.

      This JIRA is to implement low level cache.




            sershe Sergey Shelukhin
            sershe Sergey Shelukhin
            0 Vote for this issue
            1 Start watching this issue