Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2363

COW : Listing leaf files and directories twice

    XMLWordPrintableJSON

Details

    Description

      Team,

      In our organization we are still using Hudi 0.5.0.  We would upgrade to the latest version in couple of quarters.   

      problem scenario :

      Many use cases in our project using COW and hive sync is disabled.  One of the Hudi contains two years worth of data , which are partitioned by date.  For every write on this table, i notice that Listing leaf files and directories job triggered twice. Normally it is triggered only once.  Attache the screenshot. 

       

      once the first  listing leaf files and directories are done then another listing of leaf files and directories logs are rolled. 

      I  spent some time in investigating the source code but couldn't trace where exactly it is being invoked .

       

      How can it be avoided here? Unfortunately this one is adding up more latency in our flow.

       

      Attachments

        Activity

          People

            shivnarayan sivabalan narayanan
            selvaraj.periyasamy1983@gmail.com selvaraj
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: