Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-481

Support SQL-like method

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • cli
    • None

    Description

      As we know, Hudi use spark datasource api to upsert data. For example, if we want to update a data, we need to get the old row's data first, and use upsert method to update this row.
      But there's another situation where someone just wants to update one column of data. If we use a sql to describe, it is update table set col1 = X where col2 = Y. This is something hudi cannot deal with directly at present, we can only get all the data involved as a dataset first and then merge it.
      So I think maybe we can create a new subproject to process the batch data in an sql-like method. For example.

       

      val hudiTable = new HudiTable(path)
      hudiTable.update.set("col1 = X").where("col2 = Y")
      hudiTable.delete.where("col3 = Z")
      hudiTable.commit
      

      It may also extend the functionality and support jdbc-like RFC schemes: https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller

      Hope every one can provide some suggestions to see if this plan is feasible.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              chenxiang cdmikechen
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: