Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
User report :
I'm trying to understand the .commit output and how it relates to the output from the hudi-cli tool and i'm finding it difficult to reconcile my findings. specifically, i want to know the number of updates/inserts/deletes across all partitions for a given commit (an upsert). From the cli:
hudi:exec_unit_ver->commit showpartitions --commit 20200108153617
╔════════════════╤═══════════════════╤═════════════════════╤════════════════════════╤═══════════════════════╤═════════════════════╤══════════════╗
║ Partition Path │ Total Files Added │ Total Files Updated │ Total Records Inserted │ Total Records Updated │ Total Bytes Written │ Total Errors ║
╠════════════════╪═══════════════════╪═════════════════════╪════════════════════════╪═══════════════════════╪═════════════════════╪══════════════╣
║ 0 │ 0 │ 9 │ 0 │ 2091 │ 983.7 MB │ 0 ║
╟────────────────┼───────────────────┼─────────────────────┼────────────────────────┼───────────────────────┼─────────────────────┼──────────────╢
But in the 20200108153617.commit file for that commit one of the files in the partition "0" has
"numInserts" : 44448,
so not sure why Total Records Inserted is reported as zero. I checked that the sum of numUpdateWrites across all files in the partition matches 2091. Generally, i think it would be helpful to have totalRecordsInserted totalRecordsUpdated totalRecordsDeleted in the commit metadata (although it's not a big issue to sum the individual numbers from each file in each partition).
vinoth
On the counts, when I checked the code, its counting the inserts as updats, since Hudi packed them onto existing files, to honor target file size ..
for (HoodieWriteStat stat : stats) {
if (stat.getPrevCommit().equals(HoodieWriteStat.NULL_COMMIT))
else
{ totalFilesUpdated += 1; totalRecordsUpdated += stat.getNumUpdateWrites(); } totalBytesWritten += stat.getTotalWriteBytes();
totalWriteErrors += stat.getTotalWriteErrors();
}
Attachments
Issue Links
- links to