I went over the effort required to switch to kudu::Thread and kudu::ThreadPool completely and I've broken it down into some high level tasks, which are listed below.
A rough estimate of time needed for this task with one person on it is 1 week without too many distractions.
- kudu::Thread use pthreads, whereas impala::Thread uses boost::thread(). Need to measure for performance differences. Effort: High - for perf evaluation
- kudu::Thread has no ThreadGroup equivalent. Effort: Easy
- kudu::Thread needs to use a ThreadJoiner to join threads. Effort: Easy
- kudu::Thread uses kudu::MetricEntity whereas impala::Thread uses impala::MetricGroup. Effort: Moderate
- kudu::Thread does not understand impala::Webserver. Needs to be implemented inside Kudu. Effort: Moderate/High
kudu::ThreadPool has very different APIs than impala::ThreadPool
- kudu::ThreadPool has no CallableThreadPool, but has SubmitFunction() which tries to achieve the same thing. Effort: Moderate
- kudu::ThreadPool does not have a default worker task that we can assign at object creation time.
- We either need to convert everything to Submit(Runnable) or SubmitFunc(). Effort: High, and we need to check if this affects performance in any way due to the increased rate of passing around function pointers.
Below, I list the things we'd need to add to the kudu::Thread code and differentiate between the possibility of having a shared utility repo for Impala and Kudu, versus just periodically updating the kudu utils code inside Impala:
- Add a ThreadGroup class (Easy, non-invasive)
- Add a ThreadPool type that takes a worker task on object creation. Only if needed. (Easy, non-invasive)
- Add a Impala specific instrumentation functions for metrics. (Easy, but slightly invasive)
- Add a impala Webserver callback for kudu::Thread (Moderate/High, quite invasive as we need to include an Impala header inside kudu) I already got this working 80% last night.
An alternative would be to have a function that exports all thread information, and have another callback from the Impala code registered that uses this exported information to display on the webpage. That way, we avoid including Impala headers inside Kudu. But this would involve more work since we'd need to add a struct/class that's understood by both Impala and Kudu.
Tasks 1 and 2 are only "adding" to the code and not modifying any existing code.
- If we're sharing a repo, it can just be added to that repo as new APIs.
- It will be easy to keep moving this to the newer copies since we can pretty much copy paste, unless the kudu::Thread APIs change, which I doubt they will.
Task 3 is slightly invasive:
- If we're sharing a repo, then we will most likely need to standardize on other things as well, such as Metrics, in which case, this wouldn't be invasive any more, and we can just use Kudu's current instrumentation code.
- Even though it's slightly invasive, it wouldn't be hard to keep a track of and move the diffs to the newer copies.
- If we're sharing a repo, then we will have to go with the suggestion of exporting thread instrumentation as I suggested above.
- It wouldn't be hard to keep copying to the newer copies, since the only invasive part is the adding of the Impala header.