Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-1323

Improve performance of converting data to hfile

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: v1.2
    • Fix Version/s: v1.3.0, v1.4.0, v1.5.2
    • Component/s: Job Engine
    • Labels:
      None

      Description

      Supposed that we got 100GB data after cuboid building, and with setting that 10GB per region. For now, 10 split keys was calculated, and 10 region created, 10 reducer used in ‘convert to hfile’ step.

      With optimization, we could calculate 100 (or more) split keys, and use all them in ‘covert to file’ step, but sampled 10 keys in them to create regions. The result is still 10 region created, but 100 reducer used in ‘convert to file’ step. Of course, the hfile created is also 100, and load 10 files per region. That’s should be fine, doesn’t affect the query performance dramatically.

        Attachments

        1. KYLIN-1323-2.x-staging.2.patch
          49 kB
          Yerui Sun
        2. KYLIN-1323-1.x-staging.patch
          18 kB
          Yerui Sun
        3. KYLIN-1323-1.x-staging.2.patch
          24 kB
          Yerui Sun

          Activity

            People

            • Assignee:
              sunyerui Yerui Sun
              Reporter:
              sunyerui Yerui Sun
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: