Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2363

COW : Listing leaf files and directories twice

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Writer Core

      Description

      Team,

      In our organization we are still using Hudi 0.5.0.  We would upgrade to the latest version in couple of quarters.   

      problem scenario :

      Many use cases in our project using COW and hive sync is disabled.  One of the Hudi contains two years worth of data , which are partitioned by date.  For every write on this table, i notice that Listing leaf files and directories job triggered twice. Normally it is triggered only once.  Attache the screenshot. 

       

      once the first  listing leaf files and directories are done then another listing of leaf files and directories logs are rolled. 

      I  spent some time in investigating the source code but couldn't trace where exactly it is being invoked .

       

      How can it be avoided here? Unfortunately this one is adding up more latency in our flow.

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              selvaraj.periyasamy1983@gmail.com selvaraj
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: