Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-6698

hive-connector will take long time to generate splits for large phoenix tables.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Minor
    • Resolution: Unresolved
    • 5.1.0
    • connectors-6.0.0
    • hive-connector
    • None

    Description

      In our production environment, hive-phoenix connector  will take nearly 30-40 minutes to generate splits for large phoenix table, which has more than 2048 regions.it is because in class PhoenixInputFormat, function  'generateSplits' only uses one thread to generate splits for each scan. My proposal is to use multi-thread to generate splits in parallel. the proposal has been validated in our production environment.by  changing code to generate splits  in parallel with 24 threads, the time cost is reduced to 2 minutes. 

      Attachments

        Issue Links

          Activity

            People

              jichen0919 jichen
              jichen0919 jichen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: