In our organization we are still using Hudi 0.5.0. We would upgrade to the latest version in couple of quarters.
problem scenario :
Many use cases in our project using COW and hive sync is disabled. One of the Hudi contains two years worth of data , which are partitioned by date. For every write on this table, i notice that Listing leaf files and directories job triggered twice. Normally it is triggered only once. Attache the screenshot.
once the first listing leaf files and directories are done then another listing of leaf files and directories logs are rolled.
I spent some time in investigating the source code but couldn't trace where exactly it is being invoked .
How can it be avoided here? Unfortunately this one is adding up more latency in our flow.