Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-574

CLI counts small file inserts as updates

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.5.2
    • cli

    Description

      User report : 
       
      I'm trying to understand the .commit output and how it relates to the output from the hudi-cli tool and i'm finding it difficult to reconcile my findings. specifically, i want to know the number of updates/inserts/deletes across all partitions for a given commit (an upsert). From the cli:
      hudi:exec_unit_ver->commit showpartitions --commit 20200108153617
      ╔════════════════╤═══════════════════╤═════════════════════╤════════════════════════╤═══════════════════════╤═════════════════════╤══════════════╗
      ║ Partition Path │ Total Files Added │ Total Files Updated │ Total Records Inserted │ Total Records Updated │ Total Bytes Written │ Total Errors ║
      ╠════════════════╪═══════════════════╪═════════════════════╪════════════════════════╪═══════════════════════╪═════════════════════╪══════════════╣
      ║ 0 │ 0 │ 9 │ 0 │ 2091 │ 983.7 MB │ 0 ║
      ╟────────────────┼───────────────────┼─────────────────────┼────────────────────────┼───────────────────────┼─────────────────────┼──────────────╢
      But in the 20200108153617.commit file for that commit one of the files in the partition "0" has
      "numInserts" : 44448,
      so not sure why Total Records Inserted is reported as zero. I checked that the sum of numUpdateWrites across all files in the partition matches 2091. Generally, i think it would be helpful to have totalRecordsInserted totalRecordsUpdated totalRecordsDeleted in the commit metadata (although it's not a big issue to sum the individual numbers from each file in each partition).
       
      vinoth
       
      On the counts, when I checked the code, its counting the inserts as updats, since Hudi packed them onto existing files, to honor target file size ..
      for (HoodieWriteStat stat : stats) {
      if (stat.getPrevCommit().equals(HoodieWriteStat.NULL_COMMIT))

      { totalFilesAdded += 1; totalRecordsInserted += stat.getNumWrites(); }

      else

      { totalFilesUpdated += 1; totalRecordsUpdated += stat.getNumUpdateWrites(); }

      totalBytesWritten += stat.getTotalWriteBytes();
      totalWriteErrors += stat.getTotalWriteErrors();
      }
       

      Attachments

        Issue Links

          Activity

            People

              lamber-ken lamber-ken
              vinoth Vinoth Chandar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m