In the insert query of the query-based MAJOR compaction, there is this function call: "validate_acid_sort_order(ROW_ID.writeId, ROWID.bucketId, ROW_ID.rowId)".
This is to validate if the order of the rows is correct. This validation is done by the GenericUDFValidateAcidSortOrder class and it assumes that the rows are in increasing order by bucketProperty, originalTransactionId and rowId.
But actually the rows should be ordered by originalTransactionId, bucketProperty and rowId, otherwise the delete deltas cannot be applied correctly. And this is the order what the MR MAJOR compaction writes and how the split groups are created for the query-based MAJOR compaction. It doesn't cause any issue until there is only one bucketProperty in the files, but as soon as there are multiple bucketProperties in the same file, the validation will fail. This can be reproduced by running multiple merge statements after each other.
The MAJOR compaction will fail with the following error:
So the validation doesn't check for the correct row order. The correct order is originalTransactionId, bucketProperty, rowId.