Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11904

Data cache should support dumping metadata for reloading

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.3.0
    • Backend
    • None
    • ghx-label-11

    Description

      Data cache mainly includes cache metadata and cache files. The cache files are located on the disk and is responsible for storing cached data content, while the cache metadata is located in the memory and is responsible for indexing to the cache file according to the cache key.
      Currently, if the impalad process exits, the cache metadata will be lost.   After the Impalad process restarts, we cannot reuse the cache file even though it is still on the disk, because there is no corresponding cache metadata for index.
      If we can support dumping the cache metadata to disk when the process exits, then the next time the process starts it can be reloaded back into memory and the previous cache files can be reused. This would be helpful in a real production environment, where cache data often exceeds TB in size (per process), and loss of cache data due to a configuration change or version upgrade can take days to recover.

      Attachments

        Activity

          People

            eyizoha Zihao Ye
            eyizoha Zihao Ye
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: