Currently Impala commits an insert first, then reloads the table from HMS, and generates the insert events based on the difference between the two snapshots. (e.g. which file was not present in the old snapshot but are there in the new). Hive replication expects the insert events before the commit, so this may potentially lead to issues there,
The solution is to collect the new files during the insert in the backend, and send the insert events based on this file set.