Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-702

When Kylin create the flat hive table, it generates large number of small files in HDFS

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • v0.7.1, v1.0, v1.1, v1.1.1
    • v1.2, v1.4.0
    • Others
    • None

    Description

      When I build a cube, I noticed that when build the dictionary and calculate the cube, there are a large number of mappers be started (more than 10,000); With the log I noticed many mappers has 0 or much less records to process, this confused me;

      Then I checked the storage location of the flat table, found there are many files; I did a count and found it is the same number as the mappers;

      Too many mappers will cause much overhead, and downgrade the cluster's performance; Kylin should ask Hive to merge those small files during creating the flat table step.

      In my hadoop cluster, the hive.merge.mapredfiles was set to false (default value); After changing it to true for Kylin's job, the intermediate table's file number was reduced to 4, each be up to 256M, looks good; Check hive configuration at: https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shaofengshi Shao Feng Shi
            shaofengshi Shao Feng Shi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment