Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1045

Support updates during clustering

    XMLWordPrintableJSON

Details

    • Task
    • Status: In Progress
    • Blocker
    • Resolution: Unresolved
    • None
    • 1.1.0
    • clustering, table-service
    • None

    Description

      We need to allow a writer w writing to file groups f1, f2, f3, concurrently while a clustering service C  reclusters them into  f4, f5. 

      Goals

      • Writes can be either updates, deletes or inserts. 
      • Either clustering C or the writer W can finish first
      • Both W and C need to be able to complete their actions without much redoing of work. 
      • The number of output file groups for C can be higher or lower than input file groups. 
      • Need to work across and be oblivious to whether the writers are operating in OCC or NBCC modes
      • Needs to interplay well with cleaning and compaction services.

      Non-goals 

      • Strictly the sort order achieved by clustering, in face of updates (e.g updates change clustering field values, causing output clustering file groups to be not fully sorted by those fields)

      Attachments

        Activity

          People

            vinoth Vinoth Chandar
            xleesf leesf
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: