Details
-
Improvement
-
Status: Patch Available
-
Minor
-
Resolution: Unresolved
-
5.1.0
-
None
Description
In our production environment, hive-phoenix connector will take nearly 30-40 minutes to generate splits for large phoenix table, which has more than 2048 regions.it is because in class PhoenixInputFormat, function 'generateSplits' only uses one thread to generate splits for each scan. My proposal is to use multi-thread to generate splits in parallel. the proposal has been validated in our production environment.by changing code to generate splits in parallel with 24 threads, the time cost is reduced to 2 minutes.
Attachments
Attachments
Issue Links
- links to