Hudi records the commit time after the first action complete. If there is a heavy transformation before isEmpty(), then the commit time could be inaccurate.
For example, I start the spark job at 201901010000, but isEmpty() ran for 2 hours, then the commit time in the .hoodie folder will be 201901010*2*00. If I use the commit time to ingest data starting from 201901010200(from HDFS, not using deltastreamer), then I will miss 2 hours of data.
Is this set up intended? Can we move the commit time before isEmpty()?