Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
-
None
-
ghx-label-7
Description
The metadata of large tables can become quite big making it costly to hold in the statestore and disseminate to coordinator impalads. The metadata can even get so big that fundamental limits like the JVM 2GB array size and the Thrift 4GB are hit and lead to downtime.
For reducing the statestore metadata topic size we have an existing "compact_catalog_topic" flag which LZ4 compresses the metadata payload for the C++ codepaths catalogd->statestore and statestore->impalad.
Unfortunately, the metadata is not compressed in the same way during the FE->BE transition on the catalogd and the BE->FE transition on the impalad.
The goal of this change is to enable end-to-end compression for the full path of metadata dissemination. The existing code paths also need significant cleanup/streamlining. Ideally, the new code should provide consistent size limits everywhere.
Attachments
Issue Links
- Blocked
-
IMPALA-2648 catalogd crashes when serialized messages are over 2 GB
- Resolved
- breaks
-
IMPALA-6683 Restarting the Catalog without restarting Impalad and SS can block topic updates
- Resolved
-
IMPALA-6793 Metadata doesn't recover after restarting statestore
- Resolved
- causes
-
IMPALA-6599 Log spam: ImpaladCatalog.java:525] NativeLibCacheSetNeedsRefresh(hdfs://localhost:20500/test-warehouse/test-udfs.ll) failed.
- Resolved
- is related to
-
IMPALA-3173 Reduce catalog's memory footprint
- Open
- relates to
-
IMPALA-6838 Compress metadata in DDL operations
- Open