Details
Description
Background:
Currently the table on the Catalog server is either in a loaded or unloaded state (IncompleteTable). When Catalog server starts for the first time, we first fetch a list of table names for each databases and every table in this list starts as an unloaded table. The table lists are propagated to the coordinators so that they know whether a table with a given name exists or not and they can start analyzing the queries. No metadata is loaded in the incomplete tables (like schema/ownership, comments etc.)
The table metadata is loaded lazily (and the table moves into a loaded state) when it is referenced in any query. When a load request comes in, all the table metadata is loaded including file block information.
Problem:
Coordinators need some additional information when analyzing unloaded tables. For example: IMPALA-8228. The ownership information is a part of the HMS table schema which is not loaded until the table is marked fully loaded. While this is not a problem for regular queries (like select * from <tbl>), it is an issue with queries like "show tables" which do not trigger a table load. In this particular case, due to the lack of ownership information, the output of the table listing could be different depending on whether the table is loaded. Another example is IMPALA-8606 where the GET_TABLES request does not return the table comments because they are not available for unloaded tables.
Ask:
We need to consider finer grained loading on the Catalog server in general. Instead of having a binary state (loaded vs unloaded), the table could be in a partially loaded state. We could also start with aggressively fetching certain pieces of information that we think could aid with analysis and lazily load the remaining pieces of metadata. Finer grained loading also integrates well with the LocalCatalog implementation on the coordinators where the the entire table need not be loaded on the Catalog server to serve partial meta information (e.g: show partitions <large-table>).
Attachments
Attachments
Issue Links
- is related to
-
IMPALA-7501 Slim down metastore Partition objects in LocalCatalog cache
- Resolved
-
IMPALA-7533 Optimize fetch-from-catalog by caching partitions across table versions
- Resolved
-
IMPALA-9936 Only send invalidations in DDL responses to LocalCatalog coordinators
- Resolved
-
IMPALA-5468 REFRESH and a very large table run after catalog restart cannot be cancelled
- Open
-
IMPALA-8228 Support for object ownership with Ranger authorization provider
- Resolved
-
IMPALA-6354 Consider using Guava LoadingCache to cache metadata objects opposed to a ConcurrentHashMap
- Open
-
IMPALA-8953 Tables and Databases sharing same name can cause query failures if table is not readable by Impala
- Open
Issues in epic
IMPALA-2129 | Catalogd support to load only table/column metadata | Open | Unassigned | |||
|
IMPALA-3127 | Decouple partitions from tables | Resolved | Quanlong Huang | ||
IMPALA-3234 | Catalog should send incremental metadata changes to Impalads | Open | Unassigned | |||
IMPALA-3561 | Planner should request statistics for relevant columns not entire table | Open | Unassigned | |||
IMPALA-9172 | Load DB/Table ownership info on start-up. | Open | Unassigned | |||
|
IMPALA-9670 | Fix unloaded views are shown as tables for GET_TABLES requests | Resolved | Quanlong Huang | ||
IMPALA-9703 | Skip loading partition meta and file meta for PB scale tables | Open | Zhi Tang | |||
|
IMPALA-9778 | Refactor HdfsPartition to be immutable | Resolved | Quanlong Huang | ||
IMPALA-9868 | Support on-demand partition loading for non-transactional tables in catalogd | Open | Quanlong Huang | |||
IMPALA-9869 | Support on-demand partition loading for transactional tables in catalogd | Open | Unassigned | |||
IMPALA-9937 | Catalogd should send incremental metadata updates in DDL responses to legacy coordinators | Open | Unassigned | |||
IMPALA-9994 | Add exhaustive test to verify not leaking partitions in statestore catalog topic | Open | Quanlong Huang | |||
|
IMPALA-10075 | Reuse existing instances of unchanged partitions in REFRESH | Resolved | Quanlong Huang | ||
|
IMPALA-10076 | Reduce logs about partition level catalog updates | Resolved | Quanlong Huang | ||
IMPALA-10079 | Performance test for partition level catalog updates | Open | Quanlong Huang | |||
|
IMPALA-10113 | Add feature flag for incremental metadata update | Resolved | Quanlong Huang | ||
|
IMPALA-10283 | IllegalStateException in applying incremental partition updates | Resolved | Quanlong Huang |