Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
This Jira is created to track the work for improve Kudu C++ client's metacache integrity when both C++ as well Java clients are used on same partition in a particular sequence of steps that result in cache entry becoming stale in Kudu C++ client.
Here is a detailed step-wise execution of a test that could result in such a situation:
+++
metacache at Kudu client doesn't cleanup the old tablet from its cache after new partition with same range is created.
This new tablet id is the valid one and hence received from server response and should be used everywhere onwards.
If we look at the steps to repro:
1. We create a table first with following query:
+++
/** 1. Create table **/ drop table if exists impala_crash; create table if not exists impala_crash ( dt string, col string, primary key(dt) ) partition by range(dt) ( partition values <= '00000000' ) stored as kudu;
+++
2. Then, table is altered by adding a partition with range 20230301:
+++
alter table impala_crash drop if exists range partition value='20230301'; alter table impala_crash add if not exists range partition value='20230301'; insert into impala_crash values ('20230301','abc');
+++
3. Then, we alter the table again by adding a partition with same range after deleting old partition:
+++
alter table impala_crash drop if exists range partition value='20230301'; alter table impala_crash add if not exists range partition value='20230301'; insert into impala_crash values ('20230301','abc');
+++
Even though old partition is dropped and new one is added, old cache entry (with old tablet id) reference still remains in kudu client metacache, although it is marked as stale.
When we try to write the new value to same range, it first searches entry (using tablet id) inside the metacache and finds it to be stale. As a result, rpc lookup is issued which connects with server and fetches payload response that contains new tablet id as there is no old tablet entry on server anymore. This new tablet id is recorded in client metacache. When PickLeader resumes again, it goes into rpc lookup cycle which now does a successful fastpath lookup because the latest entry is present in cache. But when its callback is invoked, it again resumes work with old tablet id at hand which never gets updated.
+++
Different approaches were discussed to address this. Following are some of the approaches captured here for posterity:
+++
1. Maintain a context in impala that can be shared among different clients. The same context can be used to notify the c++ client to get rid of cache if there has been set of operations that could invalidate a cache. Simply passing tablet id may not work because that may not be enough for a client take the decision.
2. Impala sends a hint to c++ client to remove the cache entry after a DDL operation (invoked via java client) and perform a remote lookup instead of relying on the local cache.
3. Kudu detects the problem internally and returns up to RPC layer and there it over-writes the rpc structure with new tablet object and retry. This is a tricky and unclean approach and has potential of introducing bugs.
4. Change the tablet id in the RPC itself. This is a non-trivial and error prone approach as tablet id is defined const and implementation of rpc, batcher and client is done with assumption that tablet id becomes read-only after RPC is registered for an incoming op.
+++
The likelihood of finalising approach #1 or #2 is high as compared to rest. Hence, two jiras may be required to track kudu work and impala work separately.
This Jira will be used to track kudu side work
For impala side work, following Jira will be used:
https://issues.apache.org/jira/browse/IMPALA-12172