Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-702

When Kylin create the flat hive table, it generates large number of small files in HDFS

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: v0.7.1, v1.0, v1.1, v1.1.1
    • Fix Version/s: v1.2, v1.4.0
    • Component/s: Others
    • Labels:
      None

      Description

      When I build a cube, I noticed that when build the dictionary and calculate the cube, there are a large number of mappers be started (more than 10,000); With the log I noticed many mappers has 0 or much less records to process, this confused me;

      Then I checked the storage location of the flat table, found there are many files; I did a count and found it is the same number as the mappers;

      Too many mappers will cause much overhead, and downgrade the cluster's performance; Kylin should ask Hive to merge those small files during creating the flat table step.

      In my hadoop cluster, the hive.merge.mapredfiles was set to false (default value); After changing it to true for Kylin's job, the intermediate table's file number was reduced to 4, each be up to 256M, looks good; Check hive configuration at: https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration

        Attachments

          Activity

            People

            • Assignee:
              shaofengshi Shao Feng Shi
              Reporter:
              shaofengshi Shao Feng Shi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: