Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1398

Align insert file size for reducing IO

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: None

      Description

      currently we insert totalUnassignedInserts into new file if we have anything more records

      and set number of new bucket records as follow:

      recordsPerBucket.add(totalUnassignedInserts / insertBuckets); (https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java L 188)

      it just compute the avg records. and it may create new small file

      for example:

      totalUnassignedInserts = 250

      insertRecordsPerBucket = 120

      so insertBuckets = 3 (eg. file_a,file_b,file_c)

      then  file_a = file_b = file_c = 83 

      the small files will include above three file when next delta process

      and we can reduce io by set file_a = 120 file_b = 120 file_c = 10 

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              henryz steven zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: