[IMPALA-5953] Consider moving wholesale to kudu::Thread/ThreadPool - ASF JIRA

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Task
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Backend
Labels:

Epic Link:
Improve Hang Diagnostics
Epic Color:
ghx-label-6

Description

I went over the effort required to switch to kudu::Thread and kudu::ThreadPool completely and I've broken it down into some high level tasks, which are listed below.

A rough estimate of time needed for this task with one person on it is 1 week without too many distractions.

Diffs between kudu::Thread and impala::Thread:

kudu::Thread use pthreads, whereas impala::Thread uses boost::thread(). Need to measure for performance differences. Effort: High - for perf evaluation
kudu::Thread has no ThreadGroup equivalent. Effort: Easy
kudu::Thread needs to use a ThreadJoiner to join threads. Effort: Easy
kudu::Thread uses kudu::MetricEntity whereas impala::Thread uses impala::MetricGroup. Effort: Moderate
kudu::Thread does not understand impala::Webserver. Needs to be implemented inside Kudu. Effort: Moderate/High

Diffs between kudu::ThreadPool impala::ThreadPool:

kudu::ThreadPool has very different APIs than impala::ThreadPool

kudu::ThreadPool has no CallableThreadPool, but has SubmitFunction() which tries to achieve the same thing. Effort: Moderate
kudu::ThreadPool does not have a default worker task that we can assign at object creation time.
We either need to convert everything to Submit(Runnable) or SubmitFunc(). Effort: High, and we need to check if this affects performance in any way due to the increased rate of passing around function pointers.

---------------

Below, I list the things we'd need to add to the kudu::Thread code and differentiate between the possibility of having a shared utility repo for Impala and Kudu, versus just periodically updating the kudu utils code inside Impala:

Add a ThreadGroup class (Easy, non-invasive)
Add a ThreadPool type that takes a worker task on object creation. Only if needed. (Easy, non-invasive)
Add a Impala specific instrumentation functions for metrics. (Easy, but slightly invasive)
Add a impala Webserver callback for kudu::Thread (Moderate/High, quite invasive as we need to include an Impala header inside kudu) I already got this working 80% last night.
An alternative would be to have a function that exports all thread information, and have another callback from the Impala code registered that uses this exported information to display on the webpage. That way, we avoid including Impala headers inside Kudu. But this would involve more work since we'd need to add a struct/class that's understood by both Impala and Kudu.

Tasks 1 and 2 are only "adding" to the code and not modifying any existing code.

Shared repo: If we're sharing a repo, it can just be added to that repo as new APIs.
Periodic update: It will be easy to keep moving this to the newer copies since we can pretty much copy paste, unless the kudu::Thread APIs change, which I doubt they will.

Task 3 is slightly invasive:

Shared repo: If we're sharing a repo, then we will most likely need to standardize on other things as well, such as Metrics, in which case, this wouldn't be invasive any more, and we can just use Kudu's current instrumentation code.
Periodic update: Even though it's slightly invasive, it wouldn't be hard to keep a track of and move the diffs to the newer copies.

Task 4:

Shared repo: If we're sharing a repo, then we will have to go with the suggestion of exporting thread instrumentation as I suggested above.
Periodic update: It wouldn't be hard to keep copying to the newer copies, since the only invasive part is the adding of the Impala header.